nanofilt
nanofilt copied to clipboard
Suggestion
Hey! Thanks for the great program.
There were two things that would, in my eyes, really round out the utility of this tool.
-
Removal of PolyA/T tails. Only a subset of my reads still contain an A/T tail, and headclipping to remove this bias is also clipping the reads that have already had this section removed.
-
Nucleotide tail clipping on a subset of reads. Right now, it tailclips on all reads, however my fastqc report shows that I should only be tailclipping the longest reads in my fastq file.
Once again, thanks for the program!
Hi,
Thanks for the suggestions! I'll give them some thought, but have a question for each:
- Do you suggest to remove 'exact' polyA/T tails (with only AAAAAAA or only TTTTTTTTTTTTTT) or (I assume the latter) rather also allow some noise in those stretches?
- How do you think this should be implemented? Like having an option to
--clip-when-length 10000
that the user can specify for which read length the clipping rules do apply?
Cheers, Wouter
Hey,
I am working with nanopore reads which have around a 15% error call with each nucleotide. For this reason, the noise would have to be accounted for, likely with a sliding window technique. Prinseq is a program that removes the exact polyA/T tails, which I use, but this still leaves me with a +10% (T) bias for the beginning of my reads. and a slight A bias at the end.
Right now, my pipeline is to do some trimming with NanoFilt and then follow that up with the A/T trimming with Prinseq. This has given me the least nucleotide bias so far, although there is still some present.
For the 2nd suggestion, I had actually misinterpreted my FastQC report and forgot that there were fewer reads with longer lengths, hence increasing the variance of my data in that region.
Thanks, Patrick
So that leaves us only with suggestion 1? Okay, I'll think about it how to best implement this.