Struo2
Struo2 copied to clipboard
kraken2-build error when creating sequence ID to taxonomy ID map
Hi,
I'm near the end of the Struo2 pipeline trying to create a custom kraken2 database using gtdb r207.
I've hit a wall though at the kraken2-build command, specifically one spot within the build_kraken2_db.sh script that the command calls. It seems that this section:
echo "Creating sequence ID to taxonomy ID map (step 1)..."
if [ -d "library/added" ]; then
find library/added/ -name 'prelim_map_*.txt' | xargs cat > library/added/prelim_map.txt
fi
seqid2taxid_map_file=seqid2taxid.map
if [ -e "$seqid2taxid_map_file" ]; then
echo "Sequence ID to taxonomy ID map already present, skipping map creation."
else
step_time=$(get_current_time)
find library/ -maxdepth 2 -name prelim_map.txt | xargs cat > taxonomy/prelim_map.txt
if [ ! -s "taxonomy/prelim_map.txt" ]; then
echo "No preliminary seqid/taxid mapping files found, aborting."
exit 1
fi
grep "^TAXID" taxonomy/prelim_map.txt | cut -f 2- > $seqid2taxid_map_file.tmp || true
if grep "^ACCNUM" taxonomy/prelim_map.txt | cut -f 2- > accmap_file.tmp; then
if compgen -G "taxonomy/*.accession2taxid" > /dev/null; then
lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp
cat seqid2taxid_acc.tmp >> $seqid2taxid_map_file.tmp
rm seqid2taxid_acc.tmp
else
echo "Accession to taxid map files are required to build this DB."
echo "Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy' again?"
exit 1
fi
fi
rm -f accmap_file.tmp
finalize_file $seqid2taxid_map_file
echo "Sequence ID to taxonomy ID map complete. [$(report_time_elapsed $step_time)]"
fi
Produces the error messages:
Accession to taxid map files are required to build this DB.
Run 'kraken2-build --db $KRAKEN2_DB_NAME --download-taxonomy again?
When I try to run through this line by line myself everything is fine until lookup_accession_numbers accmap_file.tmp taxonomy/*.accession2taxid > seqid2taxid_acc.tmp
at which point I get the error Found 0/1363031 targets...lookup_accession_numbers: unable to open taxonomy/*.accession2taxid: No such file or directory
my ./taxonomy/ directory only contains the following:
-rw-r--r--+ 1 names.dmp
-rw-r--r--+ 1 nodes.dmp
drwxr-sr-x+ 2 .
-rw-r--r--+ 1 prelim_map.txt
drwxr-sr-x+ 5 ..
Should there be accession2taxid files in here? If so, when should they have been generated?
Happy to post on the kraken2 github if this is more appropriate but figured this maybe something that should have been generated elsewhere in the Struo2 pipeline.
Any help much appreciated, thanks!