vsearch icon indicating copy to clipboard operation
vsearch copied to clipboard

Convert Qiime2 database (2 files) into fasta database (1 file) for taxonomic assignment in vsearch

Open timz0605 opened this issue 1 year ago • 1 comments

Hello!

I am working on a COI metabarcoding project for animals. I currently have two database files, one fasta file containing all sequences and the other txt file containing taxonomic information for all sequences. Those files contain local barcodes created by lab mates in previous projects and they were using Qiime2 for analyses, which require 2 files as the database for taxonomic assignment. However, as for vsearch, it only requires one fasta file for taxonomic assignment. I was wondering if there are any commands or programs that could help do the conversion of 2 files into 1 fasta file?

Thank you!

timz0605 avatar Feb 19 '24 06:02 timz0605

hello @timz0605 there are no vsearch command to merge separated sequences and taxonomic assignments into a single fasta file.

Without knowing the exact layout of your input files, it is difficult to give you a more precise answer. When faced with a similar task, I usually combine paste, sort, join and sed to produce a fasta file.

frederic-mahe avatar Feb 19 '24 15:02 frederic-mahe

Here is an example using the command line listed above. Assuming the following layout for the taxonomic assignments and the fasta file:

s2	kingdom;genus;species2
s1	kingdom;genus;species1
>s1
ACGT
>s2
TGCA
join -j 1 \
    <(printf "s2\tkingdom;genus;species2\ns1\tkingdom;genus;species1\n" | sort -k1,1)
    <(printf ">s1\nACGT\n>s2\nTGCA\n" | paste - - | tr -d ">" | sort -k1,1) | \
    sed 's/^/>/ ; s/ /\n/2'

Sequences and taxonomic assignments are now merged:

>s1 kingdom;genus;species1
ACGT
>s2 kingdom;genus;species2
TGCA

In the code above, I use printf to generate input data. Most likely, you have input files:

join -j 1 \
    <(sort -k1,1 input.taxonomy)
    <(paste - - < input.fasta | tr -d ">" | sort -k1,1) | \
    sed 's/^/>/ ; s/ /\n/2'

frederic-mahe avatar Apr 18 '24 13:04 frederic-mahe