NCIT¶
We will use the NCIT thesaurus to provide IDs for the concepts in PDXnet. /
Parsing the NCIT ontology file with OWL-API¶
The National Cancer Institute (NCI) has developed a thesaurus (NCIT) of many kinds of cancer-relevant data. The Monarch Initiative has developed a version of this ontology as an OBO file. This Wiki provides details of this effort.
PDXIntegrator will download the OBO file with the following command.
$ java -jar target/PdxIntegrator.jar download
The ncit.obo file will be stored in a newly created directory called data
. If the file is already present, PDXIntegrator
will emit a warning message (delete the file if you want PDXIntegrator to download a fresh copy of the file).
PDXIntegrator uses the code in the class NcitOwlApiParser
to parse the Neoplasm terms of NCIT (only). Currently,
it only saves the IDs and the labels (although it may be useful in the future to also ingest the synonyms in case this
code will be used to drive an autocomplete. This is sufficient to cause the simulated patients to have
real NCIT diagnosis codes for the cancers.
For the SPARQL queries, it will be necessary to include the subclass definitions of NCIT to enable queries that use the NCIT hierarchy.