grabseqs
grabseqs copied to clipboard
All grabseqs SRA downloads failing
Looks like some changes on the NCBI side lead to failures in SRA downloads:
grabseqs sra SRR11733975
Traceback (most recent call last):
File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module>
sys.exit(main())
File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main
metadata_agg = process_sra(args, zip_func)
File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra
metadata_agg)
File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata
run_col = lines[0].index("Run")
ValueError: 'Run' is not in list
This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.
Having exactly the same issue (tried a few min ago)
same issue
Thanks for reporting the issue! Looks like @cdiener is right on, http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=
redirects to https://www.ncbi.nlm.nih.gov/sviewer/?db=sra&1%3Fdb=sra&rettype=runinfo&save=efetch&term=
and no longer returns metadata. I'll try to figure out the proper endpoint for their API to hit for the SRA metadata. (and see if I can get the tests passing in the meantime).
This is probably due to NCBI retiring Trace.
Looking through the NCBI E-utils API documentation, I should be able to get the same metadata by:
- Finding the identifiers associated with
esearch
, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999 - Passing that id list to
efetch
, e.g. https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=sra&id=22439955&rettype=fasta&retmode=text
I'll just have to move it from XML to tab-separated since it looks like the e-utils love XML. This approach also has the advantage of using a defined API, rather than that trace URL (which worked great but I think I found it originally on StackOverflow or something).
You can also request JSON from esearch which should be easier to convert with Python, for instance for your example: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=sra&term=PRJNA836386&retmax=999&retmode=json .
Hello :) Is there any workaround until this will be fixed?
Looks like some changes on the NCBI side lead to failures in SRA downloads:
grabseqs sra SRR11733975 Traceback (most recent call last): File "/users/cdiener/miniconda3/envs/sra/bin/grabseqs", line 11, in <module> sys.exit(main()) File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/__init__.py", line 58, in main metadata_agg = process_sra(args, zip_func) File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 31, in process_sra metadata_agg) File "/users/cdiener/miniconda3/envs/sra/lib/python3.7/site-packages/grabseqslib/sra.py", line 97, in get_sra_acc_metadata run_col = lines[0].index("Run") ValueError: 'Run' is not in list
This seems to be caused by a hardcoded address to download the SRA manifest that is not reachable anymore.
Try replacing /usr/local/lib/python3.6/site-packages/grabseqslib/sra.py line 94 with metadata = requests.get("https://trace.ncbi.nlm.nih.gov/Traces/sra-db-be/sra-db-be.cgi?rettype=runinfo&term="+pacc)
Thanks [zhengjxj] (https://github.com/zhengjxj). I replaced the info in the file you indicated and is working again!
thank you. it seems that the ncbi api changed.
Thanks ! @zhengjxj
Hi, is grabseqs sra facing the same problem? what would be the solution this time?