Does rmdup remove by identical sequences or substring patterns?
Hi, I have a protein fasta file and used rmdup to remove duplicated sequence (I expect to remove completely identical sequences) in the file. However, I find in my output file, it also removed a few short sequences that are substrings of another sequence. I am wondering if rmdup removes by substring pattern?
Thanks, Sophia
add --by-seq, see also the full usage: https://bioinf.shenwei.me/seqkit/usage/#rmdup
Hi Wei,
Thanks for your quick response. I have added -s option in my command and observed the short sequence removal and I am using the most recent version of seqkit.
Would you please send the removed short sequence and the long one (or simply the whole file) to me, here or by email: [email protected].
any update?