Wei Shen comments

Results 235 comments of


                                            Wei Shen

Feature Request: seqkit split option to control text appended to output files

Available in v2.3.0 : https://github.com/shenwei356/seqkit/releases/tag/v2.3.0

[Feature suggestion] Downsample sequences to a certain number of total bases based on sequence length or sequence quality

> This would allow to retain the longest reads/the reads with the best quality yielding the given number of bases. I'm not sure if this is reasonable. > downsample a...

[Question] Is there a way to calculate position-specific fastq stats?

seqkit subseq -r 1:50 | seqkit stats

[Question] Is there a way to calculate position-specific fastq stats?

Oh, sorry, `seqkit stats -a` only provide Q20(%) and Q30(%) for FASTA quality.

subseq => core dumped

> seqtk subseq reads-of-size-35-GB.fq list-of-size-20-GB.txt > output.fq Since the whole IDs list needs to be stored in RAM, a memory efficient data structure like **_BloomFilter**_ could be used for checking...

seqkit amplicon only keeps one amplicon

Right, PCR could produce all combinations of the forward and backward primers. We should output them too.

seqkit amplicon only keeps one amplicon

> In my case, the positions in the bed file would be enough. Oh, you can use `seqkit loate` first, which outputs BED format.

rename full header

It's simple, replace all the spaces with some other symbols before renaming: $ echo -en ">k141_2 flag=3 multi=4.0678 len=200\nactg\n" | sed 's/ /_/g' >k141_2_flag=3_multi=4.0678_len=200 actg $ echo -en ">k141_2 flag=3...

rename full header

Another workaround is calling seqtk twice, it may be still faster than a python script. ``` $ seqtk seq -C seqs.fa > seqs.fa.tmp # >k141_2 # actg $ seqtk rename...

rename full header

In view of modularization, a subcommand only does it's own task. And complex tasks can be done by piping multiple commands. If you do want an one-command solution, here's one:...