Manuscript: Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm

Posted on 11 Feb 2013
By Brian S McGowan, PhD
In Informatics & Analysis, Manuscript, Rapid Learning Healthcare System, Resources

Abstract

Objective Significant limitations exist in the timely and complete identification of primary and recurrent cancers for clinical and epidemiologic research. A SAS-based coding, extraction, and nomenclature tool (SCENT) was developed to address this problem.

Materials and methods SCENT employs hierarchical classification rules to identify and extract information from electronic pathology reports. Reports are analyzed and coded using a dictionary of clinical concepts and associated SNOMED codes. To assess the accuracy of SCENT, validation was conducted using manual review of pathology reports from a random sample of 400 breast and 400 prostate cancer patients diagnosed at Kaiser Permanente Southern California. Trained abstractors classified the malignancy status of each report.

Results Classifications of SCENT were highly concordant with those of abstractors, achieving κ of 0.96 and 0.95 in the breast and prostate cancer groups, respectively. SCENT identified 51 of 54 new primary and 60 of 61 recurrent cancer cases across both groups, with only three false positives in 792 true benign cases. Measures of sensitivity, specificity, positive predictive value, and negative predictive value exceeded 94% in both cancer groups.

Discussion Favorable validation results suggest that SCENT can be used to identify, extract, and code information from pathology report text. Consequently, SCENT has wide applicability in research and clinical care. Further assessment will be needed to validate performance with other clinical text sources, particularly those with greater linguistic variability.

Conclusion SCENT is proof of concept for SAS-based natural language processing applications that can be easily shared between institutions and used to support clinical and epidemiologic research.

http://jamia.bmj.com/content/20/2/349.full.pdf html

Post Tags - natural language processing, search

Written by Brian S McGowan, PhD

Dr. McGowan has served in leadership positions in numerous medical educational organizations and commercial supporters and is a Fellow of the Alliance (FACEhp). He founded the Outcomes Standardization Project, launched and hosted the Alliance Podcast, and most recently launched and hosts the JCEHP Emerging Best Practices in CPD podcast. In 2012 he Co-Founded ArcheMedX, Inc, a healthcare informatics and e-learning company to apply his research in practice.

You must be logged in to post a comment.

Clinical Operations

Commercial Teams

Resources

Medical Education

About Us

Connect with us

Resource Center

Manuscript: Identifying primary and recurrent cancers using a SAS-based natural language processing algorithm

Abstract

Written by Brian S McGowan, PhD

Leave a Comment

Social

Recent News

Reflections from DIA and BIO: What Trial Leaders Are Really Saying

What We Think We Know: How Overconfidence Derails Clinical Trials

Contact Us

300 E Main Street, Suite 101 Charlottesville, VA 22902

+1-434-260-1850

Email Us

About Us

Subscribe

Request a Demo