Search Tips

MSK-CHORD (MSK, Nature 2024)

UID: 11458

Description
This dataset contains summary data visualizations and clinical data from targeted sequencing of 25040 tumors from 24950 patients and their matched normals via MSK-IMPACT, along with clinical annotations, some of which are derived from natural language processing (denoted NLP). This data is available under the Creative Commons BY-NC-ND 4.0 license. For commercial use, please contact datarequests@mskcc.org. The study was conducted to create a clinicogenomic, harmonized oncologic real-world dataset and enable discovery of clinicogenomic relationships not apparent in smaller datasets. This dataset was then used to train a machine learning model; researchers find that models including features derived from natural language processing, such as sites of disease, outperform those based on genomic data or stage alone as tested by cross-validation and an external, multi-institution dataset. The clinical data includes data for non-small-cell lung (n = 7,809), breast (n = 5,368), colorectal (n = 5,543), prostate (n = 3,211) and pancreatic (n = 3,109) cancers.
Subject of Study
Subject(s)
OncoTree Cancer Type(s)
Atypical Lung Carcinoid
Invasive Breast Carcinoma
Breast Invasive Cancer, NOS
Breast Invasive Carcinoma, NOS
Colon Adenocarcinoma
Colorectal Adenocarcinoma
Breast Invasive Ductal Carcinoma
Breast Invasive Lobular Carcinoma
Lung Neuroendocrine Tumor
Lung Adenocarcinoma
Lung Adenosquamous Carcinoma
Lung Carcinoid
Large Cell Neuroendocrine Carcinoma
Pleomorphic Carcinoma of the Lung
Lung Squamous Cell Carcinoma
Mucinous Adenocarcinoma of the Colon and Rectum
Metaplastic Breast Cancer
Breast Mixed Ductal and Lobular Carcinoma
Non-Small Cell Lung Cancer
Poorly Differentiated Non-Small Cell Lung Cancer
Acinar Cell Carcinoma of the Pancreas
Pancreatic Adenocarcinoma
Adenosquamous Carcinoma of the Pancreas
Pancreatic Neuroendocrine Tumor
Prostate Adenocarcinoma
Rectal Adenocarcinoma
Access Restrictions
Free to All
Access Instructions
This data is available under the Creative Commons BY-NC-ND 4.0 license. For commercial use, please contact datarequests@mskcc.org
Associated Publications
Dataset Format(s)
TSV
Dataset Size
10.8 MB
Data Catalog Record Updated
2025-01-29