ncbi-acc-download icon indicating copy to clipboard operation
ncbi-acc-download copied to clipboard

Issue with recursive download

Open Anto007 opened this issue 5 years ago • 15 comments

I tried ncbi-acc-download --recursive GHGH00000000.1 and I got the below error message. Any help here would be very much appreciated.

Traceback (most recent call last): File "/home/user/tools/Python-3.6.2/virtualenv3/bin/ncbi-acc-download", line 10, in sys.exit(main()) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/main.py", line 54, in main download_to_file(dl_id, config, filename, append) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/core.py", line 118, in download_to_file _validate_and_write(r, fh, dl_id, config) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/core.py", line 162, in _validate_and_write downloaded = download_wgs_parts(handle, config) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/ncbi_acc_download/wgs.py", line 107, in download_wgs_parts records = list(SeqIO.parse(handle, config.format)) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/SeqIO/init.py", line 655, in parse for r in i: File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 489, in parse_records record = self.parse(handle, do_features) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 473, in parse if self.feed(handle, consumer, do_features): File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 445, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/home/user/tools/Python-3.6.2/virtualenv3/lib/python3.6/site-packages/Bio/GenBank/Scanner.py", line 171, in parse_features raise ValueError("Premature end of features table, marker '//' found") ValueError: Premature end of features table, marker '//' found

Anto007 avatar Sep 15 '19 08:09 Anto007

This is a record type I've never seen before, and it looks like Biopython doesn't like it. I'll open a bug report with Biopython to get it fixed, and then I can make sure ncbi-acc-download supports it.

kblin avatar Sep 16 '19 07:09 kblin

Thank you so much for your quick response. Any possible help in this regard would really make my day

Anto007 avatar Sep 16 '19 07:09 Anto007

I've opened a Biopython bug https://github.com/biopython/biopython/issues/2268, let's see what they think about this.

kblin avatar Sep 16 '19 07:09 kblin

Thanks; fingers crossed! Also, how to get the fasta file for this record on recursive mode? For example, I tried the below and I get an empty fasta file:

ncbi-acc-download --recursive NZ_AQZU00000000.1 --format fasta

ncbi-acc-download --recursive NZ_AQZU00000000.1 does appear to give me the correct .gbk file

Anto007 avatar Sep 16 '19 08:09 Anto007

Hm, I think I've never tried this for FASTA files. I don't think it'll work out of the box.

kblin avatar Sep 16 '19 08:09 kblin

Thanks again for your super-quick responses. I think I can live with .gbk files for now :-)

Anto007 avatar Sep 16 '19 08:09 Anto007

Hi, I was wondering if you have managed to find any sort of fix for "ncbi-acc-download --recursive GHGH00000000.1"? My intention is certainly not to push you here but I would be very grateful for any new pointers.

Anto007 avatar Sep 26 '19 06:09 Anto007

This will only be fixed once Biopython 1.75 is released, as that contains a fix for the problem.

kblin avatar Sep 29 '19 17:09 kblin

Many thanks for your response. I will await the release of Biopython 1.75.

Anto007 avatar Sep 30 '19 05:09 Anto007

End of year cleaning of old issues. This one should be fixed by current Biopython versions. Use pip install --upgrade biopython in the same virtualenv you installed ncbi-acc-download into, and you should be good to go. Please don't hesitate to comment if the issue still exists after upgrading Biopython.

kblin avatar Dec 23 '19 19:12 kblin

Thank you so much for remembering to follow up with this- much appreciated! I would like to report that although there are no error messages being output now after the biopython upgrade, the command ncbi-acc-download --recursive GHGH00000000.1 merely downloads the master record gbk file and not all of the records that are covered by the master record. Unfortunately, the purpose of having 'recursive' doesn't seem to be served here.

Anto007 avatar Dec 24 '19 06:12 Anto007

Thanks for testing. I'll have a look at this.

kblin avatar Dec 24 '19 13:12 kblin

Ah, shoot, it looks like there's still an issue in the Biopython support for this. 😞 We need https://github.com/biopython/biopython/pull/2432 to land and be shipped first. And I think I still need a change in ncbi-acc-download as well.

kblin avatar Dec 24 '19 22:12 kblin

Ok, another try. Once Biopython releases 1.77, 789a34b4da52c43923ebff47c3141b4468f46892 should have fixed it. Install the 0.2.6 version of ncbi-acc-download I just to get the fix.

kblin avatar Jan 02 '20 06:01 kblin

Many thanks for the update; once biopython 1.77 is out, I'll give a try with ncbi-acc-download v0.2.6

Anto007 avatar Jan 02 '20 08:01 Anto007