ABSTRACT: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data

Posted on 7 Mar 2013
By Brian S McGowan, PhD
In Abstract, Informatics & Analysis, Rapid Learning Healthcare System, Resources

Abstract
Background Prognostic studies of breast cancer survivability have been aided by machine learning algorithms, which can predict the survival of a particular patient based on historical patient data. However, it is not easy to collect labeled patient records. It takes at least 5 years to label a patient record as ‘survived’ or ‘not survived’. Unguided trials of numerous types of oncology therapies are also very expensive. Confidentiality agreements with doctors and patients are also required to obtain labeled patient records.

Proposed method These difficulties in the collection of labeled patient data have led researchers to consider semi-supervised learning (SSL), a recent machine learning algorithm, because it is also capable of utilizing unlabeled patient data, which is relatively easier to collect. Therefore, it is regarded as an algorithm that could circumvent the known difficulties. However, the fact is yet valid even on SSL that more labeled data lead to better prediction. To compensate for the lack of labeled patient data, we may consider the concept of tagging virtual labels to unlabeled patient data, that is, ‘pseudo-labels,’ and treating them as if they were labeled.

Results Our proposed algorithm, ‘SSL Co-training’, implements this concept based on SSL. SSL Co-training was tested using the surveillance, epidemiology, and end results database for breast cancer and it delivered a mean accuracy of 76% and a mean area under the curve of 0.81.

via Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data — Kim and Shin — Journal of the American Medical Informatics Association.

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

ABSTRACT: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

The Earliest Warning Sign: How Measuring Readiness at Training Predicts Trial Risk Before It Shows Up in the Data

Fresh Starts, Real Readiness: Turning Site Initiation into a Trial Success Multiplier

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo