Wei Shen
Wei Shen
Available in v2.3.0 : https://github.com/shenwei356/seqkit/releases/tag/v2.3.0
> This would allow to retain the longest reads/the reads with the best quality yielding the given number of bases. I'm not sure if this is reasonable. > downsample a...
seqkit subseq -r 1:50 | seqkit stats
Oh, sorry, `seqkit stats -a` only provide Q20(%) and Q30(%) for FASTA quality.
> seqtk subseq reads-of-size-35-GB.fq list-of-size-20-GB.txt > output.fq Since the whole IDs list needs to be stored in RAM, a memory efficient data structure like **_BloomFilter**_ could be used for checking...
Right, PCR could produce all combinations of the forward and backward primers. We should output them too.
> In my case, the positions in the bed file would be enough. Oh, you can use `seqkit loate` first, which outputs BED format.
It's simple, replace all the spaces with some other symbols before renaming: $ echo -en ">k141_2 flag=3 multi=4.0678 len=200\nactg\n" | sed 's/ /_/g' >k141_2_flag=3_multi=4.0678_len=200 actg $ echo -en ">k141_2 flag=3...
Another workaround is calling seqtk twice, it may be still faster than a python script. ``` $ seqtk seq -C seqs.fa > seqs.fa.tmp # >k141_2 # actg $ seqtk rename...
In view of modularization, a subcommand only does it's own task. And complex tasks can be done by piping multiple commands. If you do want an one-command solution, here's one:...