SigProfilerMatrixGenerator icon indicating copy to clipboard operation
SigProfilerMatrixGenerator copied to clipboard

mysterious hyphens when processing INDELs from ICGC data

Open mattiyeh opened this issue 1 year ago • 2 comments

https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/blob/f945199230a4fc0671d90a7873b079930a84d227/SigProfilerMatrixGenerator/scripts/convert_input_to_simple_files.py#L332C10-L332C10

Hello, why are the hyphens added to ref and mut when the other functions don't do similar actions? This breaks downstream because they are added again in MutationMatrixGenerator.py (lines 1176-1179) and then you can get a KeyError at line 1617 revcompl(type_sequence) because the '-' character is not in the revcompl map.

i fixed this by commenting out the lines in convert_input_to_sample_files, but can someone explain if this will have unintended consequences?

thanks, Marc

mattiyeh avatar Oct 09 '23 19:10 mattiyeh

Hi @mattiyeh,

Thanks for reaching out again about the issue you encountered with ICGC input files. It would be a great help if you could please provide an input file to reproduce the issue you identified. Thanks!

mdbarnesUCSD avatar Nov 10 '23 20:11 mdbarnesUCSD

Hi Mark,

Sure. here is a sample input file.

stomach_indel_mutations.txt

Thanks, Marc

mattiyeh avatar Nov 10 '23 21:11 mattiyeh