SETH icon indicating copy to clipboard operation
SETH copied to clipboard

Incorrectly extracted mutations

Open jhkbg opened this issue 8 years ago • 2 comments

Some patterns return only partial matches against a longer mutation. Need to extend these patterns or create new, longer ones that have precedence over the shorter matches.

Examples:

  1. PMID=20806047 occurrence=p.X320SerextX72 extracted=p.X320Ser
  2. PMID=23903049 occurrence=p.His33GInfsX32 extracted=p.His33G
  3. PMID=22907560 occurrence=p.Arg313Hys extracted=p.Arg313H
  4. PMID=18486607 occurrence=p.Arg315Stop extracted=p.Arg315S
  5. PMID=23017188 occurrence=p.Phe508Del extracted=p.Phe508D
  6. PMID=24158885 occurrence=p.Met694IIe extracted=p.Met694I
  7. PMID=23856132 occurrence=p.F55>Lfs extracted=p.F55>L
  8. PMID=18708425 occurrence=p.L15_L16ins2L extracted=p.L15_L16ins2

jhkbg avatar Oct 21 '16 23:10 jhkbg

Thanks for the report. I added test-cases for the described errors here.

Some errors (3, 4, 5, 6) should be easy to fix. It seems that the parser stops too early in these cases. Other errors probably need some major adaption of the implemented Backus Naur grammar (e.g., 1,7,8). https://github.com/rockt/SETH/blob/master/src/test/java/de/hu/berlin/wbi/issues/Request10Test.java

Erechtheus avatar Nov 02 '16 11:11 Erechtheus

Cool, thanks. I will look into this as well at some point.

jhkbg avatar Nov 02 '16 18:11 jhkbg