seqtk
seqtk copied to clipboard
added -B/-E to trimfq for keeping first/last INT bp and also -s for shortest read
This is basically a resurrected issue https://github.com/lh3/seqtk/pull/38 which was brought up to date to the latest release of seqtk such that they do not interfere with the original command line options of seqtk anymore.
More precisely this adds for trimfq the following:
-s INT trimming by -b/-e/-B/-E shall not produce reads shorter then INT bp
-B INT keep first INT bp from left (non-zero to disable -q/-e/-E)
-E INT keep last INT bp from right (non-zero to disable -q/-b/-B)
This allows a more precise control of how trimming is done. This kind of trimming is used heavily in FusionCatcher (by using a forked seqtk instead of the original seqtk). Here https://github.com/lh3/seqtk/pull/38 was mentioned that this kind of trimming is rare but actually in ractice is used a lot. Regarding popularity of such trimming, one has that, for example:
- version of FusionCatcher v0.99.6a was downloaded over 1706 times since April 1, 2016 as shown here,
- FusionCatcher, which needs/uses forked seqtk which does this kind trimming, has been used in 32 scientific articles,
- FusionCatcher has been used by teen cancer survivor Elana Simon to study her own rare disease as shown here, who was also given as example of precision medicine at White House for this (see: here).
Any news here @lh3 @ndaniel ?