SETH
SETH copied to clipboard
SNP Extraction Tool for Human Variations
Currently, SETH does not identify synonymous protein-mutations (e.g., Asp106Asp).
Some publications refer to mutations by using dbSNP ss-sequence identifier . See here for an example: https://www.sciencedirect.com/science/article/abs/pii/S0920996409003806
https://github.com/biocommons/hgvs/tree/c9ee7504576d746f79135f4235a99fd6fe656852 https://github.com/mutalyzer/mutalyzer
dbSNP changed from generating XML to JSON dunps. For building the database, we need to modify the update scripts. Data can be found at https://ftp.ncbi.nlm.nih.gov/snp/latest_release/
https://github.com/ibm-aur-nlp/amia-18-mutation-corpus
For more details, see the tmVar 2.0 paper (https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/), which describes a nice strategy to use patterns for named entity normalization
Check tmVar 2.0 paper (https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/) They incorporate beside dbSNP also ClinVar and "hearst patterns" ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/hgvs4variation.txt.gz
https://www.ncbi.nlm.nih.gov/research/bionlp/Tools/tmvar/
In addition to sequence-based mappings of mutations to genes occurring in the same text, we should add a few text-based mappings or checks. Oftentimes, particularly for full texts with lots...