snp-dists
snp-dists copied to clipboard
Exclude certain FASTA sequences in pairwise assessment
For multi-FASTA files it may be useful to be able to exclude certain sequences by FASTA header ID when performing the pairwise SNP comparison. For example, excluding the reference sequence when processing COVID-19 sequences and comparisons to the reference are not needed. The input argument could accept either a .txt file of line-separated IDs or a bash array.
Until this feature exists, this could work:
seqkit grep -v -f ids_to_ignore.txt < input.afa | snp-dists /dev/stdin > out.tsv