snp-dists icon indicating copy to clipboard operation
snp-dists copied to clipboard

Exclude certain FASTA sequences in pairwise assessment

Open matt-sd-watson opened this issue 4 years ago • 1 comments

For multi-FASTA files it may be useful to be able to exclude certain sequences by FASTA header ID when performing the pairwise SNP comparison. For example, excluding the reference sequence when processing COVID-19 sequences and comparisons to the reference are not needed. The input argument could accept either a .txt file of line-separated IDs or a bash array.

matt-sd-watson avatar Sep 05 '21 22:09 matt-sd-watson

Until this feature exists, this could work:

seqkit grep -v -f ids_to_ignore.txt < input.afa | snp-dists /dev/stdin > out.tsv

tseemann avatar Sep 05 '21 22:09 tseemann