About the Data Catalog
Welcome to the MSK Data Catalog, a searchable and browsable online collection of records describing the contents of datasets and providing access instructions for those wishing to explore the data for their own research. As a user, the data catalog is designed to help you identify data resources that may inform your project or research question. Some data resources may be immediately available to you from the catalog. Others may require you to complete a data use application form or contact the data set owner to explore your options.
The MSK data catalog is a digital way-finder designed to enhance discoverability and maximize the usefulness of datasets through rich metadata: description, keywords, data formats, instrumentation or software utilized in the creation of the data, and access information. It also connects researchers to data authors and experts to promote greater exposure and reuse of data while providing appropriate protections for authors. The Data Catalog is NOT a data repository to store data, but rather a portal to datasets stored in internal and external repositories. Our current inclusion criteria for this catalog are:
- Internal datasets created by MSK researchers and associated with publication(s),
- Internal datasets created by MSK researchers but not yet published or with no associated publication(s),
- External datasets referenced and used by MSK researchers in publication(s).
If you are interested in submitting a dataset to the MSK Data Catalog, would like to suggest a data set for inclusion, or are willing to serve as a local expert, please contact us.
Highlights of the MSK Data Catalog:
- Aggregate access to MSK datasets from one location,
- Identify access points, experts, or owners of MSK datasets,
- Help MSK researchers locate and understand datasets generated at external organizations,
- Facilitate both internal and external research collaborations,
- Increase the visibility of research data generated by MSK researchers and support data re-use,
- Enhance Synapse records with data availability information.
While the cataloged datasets represent MSK data, they do not include all publicly available, licensed, or private external datasets. If you would like assistance in conducting a more comprehensive search of all datasets accessible through the library, please contact us.
The code used to create the MSK Data Catalog is open source and available via GitHub. Documentation and further information is available via OSF. If you would like to create a similar catalog, you can learn more by visiting the Data Discovery Collaboration.
Meet the Team
Anthony Dellureficio
Data Catalog Coordinator
Anthony is the Associate Librarian for Data Management Services at the MSK Library. Drawing on a background in library technology, he works with researchers to provide systems solutions and offer a wide range of data management services at MSK. His academic areas of interest include the history of science, technology, and medicine and classical genetics.
Eric Willoughby
Data Catalog Developer
Eric is the Programmer/Analyst at the MSK Library, where he works with library professionals to develop applications which supports the diverse information needs of MSK researchers, staff, and patients.
Klara Pokrzywa
Data Catalog Metadata Librarian
Klara is the Metadata Librarian for Data Management Services at the MSK Library. Klara works to support data discovery and metadata interoperabiity. She also creates and updates catalog records for new and existing MSK datasets.
Donna Gibson
Data Catalog Advocate
Donna is the Director of Library Services. Delivering new services or programs that enhance the user's information retrieval experience from start to finish is important to her. She enjoys connecting with her users to better understand how published literature and other research outputs, technology, and social networking tools integrate within their work environment.