KMC icon indicating copy to clipboard operation
KMC copied to clipboard

hard masking -- softer criterion

Open notestaff opened this issue 5 years ago • 0 comments

Right now, a base is hard masked to N if at least 1 of the kmers it's in is "invalid" (has count less than -ci). Can you make that a parameter: hard mask a base if at least X of the kmers it's in are invalid (or equivalently if no more than Y kmers it's in are valid)? Right now, if I have a reference and some reads and want to see which read kmers occur in the reference, a SNP in the reads relative to the reference causes 2*k bases of the reference to be masked, even though just one base is a mismatch. Raising the parameter X (or lowering Y) would shrink that window.

Also, it'd be good if there was an option to soft-mask bases by changing them to lowercase, instead of hard-masking them to N.

Also, seems like CFastqFilter::HardMask() could be sped up by using memset and memcpy instead of loops? @marekkokot

notestaff avatar Aug 01 '18 01:08 notestaff