grabseqs icon indicating copy to clipboard operation
grabseqs copied to clipboard

All grabseqs SRA downloads failing

Open cdiener opened this issue 2 years ago • 10 comments

Looks like some changes on the NCBI side lead to failures in SRA downloads:

grabseqs sra SRR11733975
Traceback (most recent call last):
  File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module>
    sys.exit(main())
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main
    metadata_agg = process_sra(args, zip_func)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra
    metadata_agg)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata
    run_col = lines[0].index("Run")
ValueError: 'Run' is not in list

This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.

cdiener avatar Jun 28 '22 18:06 cdiener

Having exactly the same issue (tried a few min ago)

AntonioBaeza avatar Jun 29 '22 00:06 AntonioBaeza

same issue

Zeroo11 avatar Jun 30 '22 01:06 Zeroo11

Thanks for reporting the issue! Looks like @cdiener is right on, http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term= redirects to https://www.ncbi.nlm.nih.gov/sviewer/?db=sra&1%3Fdb=sra&rettype=runinfo&save=efetch&term= and no longer returns metadata. I'll try to figure out the proper endpoint for their API to hit for the SRA metadata. (and see if I can get the tests passing in the meantime).

This is probably due to NCBI retiring Trace.

Looking through the NCBI E-utils API documentation, I should be able to get the same metadata by:

  1. Finding the identifiers associated with esearch, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999
  2. Passing that id list to efetch, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=22439955&rettype=fasta&retmode=text

I'll just have to move it from XML to tab-separated since it looks like the e-utils love XML. This approach also has the advantage of using a defined API, rather than that trace URL (which worked great but I think I found it originally on StackOverflow or something).

louiejtaylor avatar Jun 30 '22 20:06 louiejtaylor

You can also request JSON from esearch which should be easier to convert with Python, for instance for your example: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999&retmode=json .

cdiener avatar Jul 01 '22 17:07 cdiener

Hello :) Is there any workaround until this will be fixed?

GitUser42 avatar Jul 09 '22 17:07 GitUser42

Looks like some changes on the NCBI side lead to failures in SRA downloads:

grabseqs sra SRR11733975
Traceback (most recent call last):
  File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module>
    sys.exit(main())
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main
    metadata_agg = process_sra(args, zip_func)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra
    metadata_agg)
  File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata
    run_col = lines[0].index("Run")
ValueError: 'Run' is not in list

This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.

Try replacing /usr/local/lib/python3.6/site-packages/grabseqslib/sra.py line 94 with metadata = requests.get("https://trace.ncbi.nlm.nih.gov/Traces/sra-db-be/sra-db-be.cgi?rettype=runinfo&term="+pacc)

zhengjxj avatar Jul 10 '22 15:07 zhengjxj

Thanks [zhengjxj] (https://github.com/zhengjxj). I replaced the info in the file you indicated and is working again!

AntonioBaeza avatar Jul 17 '22 00:07 AntonioBaeza

thank you. it seems that the ncbi api changed.

chansigit avatar Jul 18 '22 13:07 chansigit

Thanks ! @zhengjxj

xiachenrui avatar Jun 11 '23 07:06 xiachenrui

Hi, is grabseqs sra facing the same problem? what would be the solution this time?

AMMHasan avatar May 13 '24 17:05 AMMHasan