vsearch Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch

Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch

Open timz0605 opened this issue 1 year ago • 1 comments

Hello!

I am working on a COI metabarcoding project for animals. I currently have two database files, one fasta file containing all sequences and the other txt file containing taxonomic information for all sequences. Those files contain local barcodes created by lab mates in previous projects and they were using Qiime2 for analyses, which require 2 files as the database for taxonomic assignment. However, as for vsearch, it only requires one fasta file for taxonomic assignment. I was wondering if there are any commands or programs that could help do the conversion of 2 files into 1 fasta file?

Thank you!

Feb 19 '24 06:02 timz0605

hello @timz0605 there are no vsearch command to merge separated sequences and taxonomic assignments into a single fasta file.

Without knowing the exact layout of your input files, it is difficult to give you a more precise answer. When faced with a similar task, I usually combine paste, sort, join and sed to produce a fasta file.

Feb 19 '24 15:02 frederic-mahe

Here is an example using the command line listed above. Assuming the following layout for the taxonomic assignments and the fasta file:

s2	kingdom;genus;species2
s1	kingdom;genus;species1

>s1
ACGT
>s2
TGCA

join -j 1 \
    <(printf "s2\tkingdom;genus;species2\ns1\tkingdom;genus;species1\n" | sort -k1,1)
    <(printf ">s1\nACGT\n>s2\nTGCA\n" | paste - - | tr -d ">" | sort -k1,1) | \
    sed 's/^/>/ ; s/ /\n/2'

Sequences and taxonomic assignments are now merged:

>s1 kingdom;genus;species1
ACGT
>s2 kingdom;genus;species2
TGCA

In the code above, I use printf to generate input data. Most likely, you have input files:

join -j 1 \
    <(sort -k1,1 input.taxonomy)
    <(paste - - < input.fasta | tr -d ">" | sort -k1,1) | \
    sed 's/^/>/ ; s/ /\n/2'

Apr 18 '24 13:04 frederic-mahe

vsearch vsearch copied to clipboard

Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch

vsearch
vsearch copied to clipboard