kaiju icon indicating copy to clipboard operation
kaiju copied to clipboard

kaiju-makedb for mar database

Open ThijsSt opened this issue 2 years ago • 3 comments

Hi, I've been trying to set up the mar database for a metagenomics project, but I've been running into two odd issues:

  1. Sometimes, when installing the database (I've found that this goes with all the databases), you get the following error: `\033[0;32mDownloading taxdump.tar.gz\033[0m 2022-05-16 12:16:55 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [1800] -> ".listing" [1] 2022-05-16 12:17:03 URL: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz [58436660] -> "taxdump.tar.gz" [1] \033[0;32mExtracting taxdump.tar.gz\033[0m

gzip: stdin: invalid compressed data--format violated tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now`

This does not always happen, but it is kind of random and I'm not sure if anything can be done.

  1. When the download works, something odd happens and I get the following error message: \033[0;32mExtracting taxdump.tar.gz\033[0m \033[0;32mDownloading MarRef metadata from MMP (databasesapi.sfb.uit.no)\033[0m \033[0;32mCurrent MarRef version is: 1.7\033[0m % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 75801 0 75801 0 0 89177 0 --:--:-- --:--:-- --:--:-- 277k \033[0;32mDownloading MarRef reference genomes from the Marine Metagenomics Portal using 5 threads\033[0m mv: cannot stat ‘mar/source/public.sfb.uit.no/MarRef/genomes/*’: No such file or directory

I've looked at the kaiju-makedb script, and I think the jq step silently fails, but can you maybe help me figure out how to bypass this error?

Thanks

Thijs

ThijsSt avatar May 16 '22 17:05 ThijsSt

It looks like that your downloaded files are corrupted or not properly downloaded, so then they are not found when it says cannot stat ‘mar/source/public.sfb.uit.no/MarRef/genomes/*’

pmenzel avatar May 24 '22 19:05 pmenzel

Yes, I've been going over the code in the kaiju-makedb script with my admittedly limited experience in bioIT, and it seems that the download from the MarRef database somehow does not work. f [ "$DB" = "mar" -o "$DB" = "mar_ref" -o "$DB" = "mar_db" ] then mkdir -p $DB/source if [ $index_only -eq 0 ] then if [ $DL -eq 1 ] then if [ "$DB" = "mar" -o "$DB" = "mar_ref" ] then echo "${GREEN}Downloading MarRef metadata from MMP (databasesapi.sfb.uit.no)${NC}" MARREF_VERSION=$(curl -Ls -o /dev/null -w %{url_effective} https://databasesapi.sfb.uit.no/rest/v1/MarRef/records | grep -Po 'ver=\K\d+\.\d+') echo "${GREEN}Current MarRef version is: ${MARREF_VERSION}${NC}" curl "https://databasesapi.sfb.uit.no/rpc/v1/MarRef/graphs?x%5Basmbl%3Asequences%5D=each&y_yName%5Btax%3Aorganism%5D=setR" -o $DB/MarRef.json -L [ -r $DB/MarRef.json ] || { echo -e "${RED}Missing file MarRef.json${NC}"; exit 1; } MARREF_COUNT=$(jq .graph[].x $DB/MarRef.json | wc -l) All works fine, but then when I get to jq .graph[].x $DB/MarRef.json | tr -d '"' | xargs -I{} -P $parallelDL wget -P $DB/source -q -np --recursive https://public.sfb.uit.no/MarRef/genomes/{}/protein.faa || true

Something weird happens. jq.graph[].x $DB/MarRef.json | tr -d '"' This part works fine, and I started to suspect xargs -I{} -P $parallelDL wget -P $DB/source -q -np --recursive https://public.sfb.uit.no/MarRef/genomes/{}/protein.faa || true

So I instead of 'true' entered echo ERROR : xargs -I{} -P $parallelDL wget -P $DB/source -q -np --recursive https://public.sfb.uit.no/MarRef/genomes/{}/protein.faa || echo ERROR

Which, when running the whole command does indeed only give you an ERROR, meaning that the command somehow fails. I'll try to figure out where it goes wrong, but any thoughts are much appreciated as this is all very new to me

ThijsSt avatar May 26 '22 19:05 ThijsSt

im having this exact issue- did it get solved in the end?

spencerlong1 avatar Aug 31 '23 11:08 spencerlong1