sangeranalyseR
sangeranalyseR copied to clipboard
minreadlength
this parameter doesn't seem to work as described:
Reads shorter than this will not be included in the readset. The default 20means that all reads with length of 20 or more will be included. Note that this is the length of a read after it has been trimmed.
If I run SangerAlignment
with some really low quality data, it retains even very short reads (e.g. length 1) and keeps them in the readset. This means that the contig tree is never built.
Can you take a look and see what's going on @HowardChao? I'll send you some data...
@roblanf Do we need to store the trimmed reads that are shorter than the minreadlength
? Reads that cannot construct into a contig won't be stored inside contigList
in the current version. If we want to store those reads, I need to add another slot in SangerContig to store those skipped reads.
@HowardChao, it seems sensible to add a slot for reads that are too short. If we do this, then the excluded reads can be included later if a user changes e.g. the trimming threshold.
This also works in the other direction - a user might have a SangerContig or SangerAlignment with all the reads in it, but then want to change the trimming threshold to be more stringent, after which some reads might need to be excluded.
So I'd say yes - we do need a slot in SangerContig for excluded reads.
This also means that a summary table of the reads in a SangerContig and/or SangerAlignment can include these excluded reads, meaning that users can see what happened to every single one of their input reads.
@roblanf I will include this feature in the next version because the deadline to pass the Bioconductor technique review is this Wednesday (Thursday in Australia) and I need more time to implement it. 👍