KMC
KMC copied to clipboard
hard masking -- softer criterion
Right now, a base is hard masked to N if at least 1 of the kmers it's in is "invalid" (has count less than -ci). Can you make that a parameter: hard mask a base if at least X of the kmers it's in are invalid (or equivalently if no more than Y kmers it's in are valid)? Right now, if I have a reference and some reads and want to see which read kmers occur in the reference, a SNP in the reads relative to the reference causes 2*k bases of the reference to be masked, even though just one base is a mismatch. Raising the parameter X (or lowering Y) would shrink that window.
Also, it'd be good if there was an option to soft-mask bases by changing them to lowercase, instead of hard-masking them to N.
Also, seems like CFastqFilter::HardMask() could be sped up by using memset and memcpy instead of loops? @marekkokot