KMC icon indicating copy to clipboard operation
KMC copied to clipboard

filtering paired reads

Open notestaff opened this issue 6 years ago • 3 comments

When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.

notestaff avatar May 24 '18 08:05 notestaff

@marekkokot for filtering paired reads you could use strand information: if the kmers come from one strand only (e.g. kmers from a genome), you could check that read1 has kmers from one strand while read2 has kmers from the other strand.

notestaff avatar Jun 15 '18 17:06 notestaff

@marekkokot For the most precise filtering, you'd have kmc_tools filter take as input two single-strand kmer databases: one made by kmc -b from a set of genome sequences, and one from reverse complements of these sequences. You'd then keep a read if it meets the filtering criteria for either database. For paired reads, you'd keep a read pair if read1 meets criteria for the forward-strand database and read2 for the reverse complement-strand database, or vice versa.

notestaff avatar Jun 18 '18 19:06 notestaff

When filtering reads, if the reads are paired, it would help to be able to say either "keep both reads if one passes the filter" or "drop both reads if one fails the filter", while preserving the read pairing.

I would find this most useful, too!

hannesbecher avatar Aug 05 '21 11:08 hannesbecher