vsearch
vsearch copied to clipboard
Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch
Hello!
I am working on a COI metabarcoding project for animals. I currently have two database files, one fasta file containing all sequences and the other txt file containing taxonomic information for all sequences. Those files contain local barcodes created by lab mates in previous projects and they were using Qiime2 for analyses, which require 2 files as the database for taxonomic assignment. However, as for vsearch, it only requires one fasta file for taxonomic assignment. I was wondering if there are any commands or programs that could help do the conversion of 2 files into 1 fasta file?
Thank you!
hello @timz0605 there are no vsearch
command to merge separated sequences and taxonomic assignments into a single fasta file.
Without knowing the exact layout of your input files, it is difficult to give you a more precise answer. When faced with a similar task, I usually combine paste
, sort
, join
and sed
to produce a fasta file.
Here is an example using the command line listed above. Assuming the following layout for the taxonomic assignments and the fasta file:
s2 kingdom;genus;species2
s1 kingdom;genus;species1
>s1
ACGT
>s2
TGCA
join -j 1 \
<(printf "s2\tkingdom;genus;species2\ns1\tkingdom;genus;species1\n" | sort -k1,1)
<(printf ">s1\nACGT\n>s2\nTGCA\n" | paste - - | tr -d ">" | sort -k1,1) | \
sed 's/^/>/ ; s/ /\n/2'
Sequences and taxonomic assignments are now merged:
>s1 kingdom;genus;species1
ACGT
>s2 kingdom;genus;species2
TGCA
In the code above, I use printf
to generate input data. Most likely, you have input files:
join -j 1 \
<(sort -k1,1 input.taxonomy)
<(paste - - < input.fasta | tr -d ">" | sort -k1,1) | \
sed 's/^/>/ ; s/ /\n/2'