SigProfilerMatrixGenerator
SigProfilerMatrixGenerator copied to clipboard
mysterious hyphens when processing INDELs from ICGC data
https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/blob/f945199230a4fc0671d90a7873b079930a84d227/SigProfilerMatrixGenerator/scripts/convert_input_to_simple_files.py#L332C10-L332C10
Hello,
why are the hyphens added to ref
and mut
when the other functions don't do similar actions? This breaks downstream because they are added again in MutationMatrixGenerator.py (lines 1176-1179) and then you can get a KeyError at line 1617 revcompl(type_sequence)
because the '-'
character is not in the revcompl map.
i fixed this by commenting out the lines in convert_input_to_sample_files, but can someone explain if this will have unintended consequences?
thanks, Marc
Hi @mattiyeh,
Thanks for reaching out again about the issue you encountered with ICGC input files. It would be a great help if you could please provide an input file to reproduce the issue you identified. Thanks!