civicmine
civicmine copied to clipboard
Can't find T790M mutation in civicmine
Hi jakelever,
Thanks for this wonderful project.
When i used the civicmine (http://bionlp.bcgsc.ca/civicmine) i can't find "T790M" in any sentence. It was odd for me because EGFR T790M is very famous biomarker in treatment cancer.
This is a tokenizer problem that Spacy language model (en_core_web_sm) tokenizes the "T790M" as a "T790" and "M". (('T790', 'NOUN'), ('M', 'PROPN'))
I changed the kindred package like this (kindred/Parser.py)
if not model in Parser._models:
Parser._models[model] = spacy.load(model, disable=['ner'])
self.nlp = Parser._models[model]
special_case = [{ORTH: "T790M"}]
self.nlp.tokenizer.add_special_case("T790M", special_case)
Now "T790M" is ('T790M', 'VERB') fixed.
best, jakelever
Hi @hongiiv , thanks for looking into this. I'll have a little dig myself and see what other issues there may be.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.