datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Getting error "Error: Internal error (invalid zip archive). Please try again." Take 2

Open corneliusroemer opened this issue 1 year ago • 3 comments

Sadly the issue is still active, at least for taxons ebola-zaire and mpox.

See #356

New version of client (16.16.0) available at https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-arm64/datasets.
Error: Internal error (invalid zip archive). Please try again

Originally posted by @corneliusroemer in https://github.com/ncbi/datasets/issues/356#issuecomment-2111024211

corneliusroemer avatar May 15 '24 13:05 corneliusroemer

Thanks @corneliusroemer, we are continuing to look into this. Would you mind updating to 16.16.0 and if the problem persists, please include --debug and report the phid. This will help us to better understand what went wrong.

Best, Eric

ericcox1 avatar May 16 '24 12:05 ericcox1

I am also seeing this error in our automated pipelines for zika, mpox, measles, and dengue, which are all scheduled to run at 9AM PDT. If I rerun the workflow at a later time, the error goes away. Does the time coincide with the datasets updates?

joverlee521 avatar May 16 '24 16:05 joverlee521

@ericcox1 Yes, getting the error with 16.16.0 as well. An example run is: Ncbi-Phid: 1D715361FD2DDA414583C0181D715361FD2DDA414583C018 (it might be that this exact run happened to work, I can't tell as having run --debug my terminal got flooded with binary text). I'll try to provoke an error again.

Is it possible that some part of the server struggles with the number of requests it's getting? As part of a project, I'm doing dataset downloads via CLI for a few taxa around every 3 minutes (it's run as part of CI). It's done with API key and the allowed rate is 10 requests per second so we should be far away from that limit but it might still be that no one else hitherto has sent requests so frequently.

corneliusroemer avatar May 17 '24 21:05 corneliusroemer

I've been getting the same error (Error: Internal error (invalid zip archive). Please try again) repeatedly for the past several days while trying to get influenza A genomes with this command:

datasets download virus genome taxon 11320 --include genome,biosample --debug >& datasets.log

Here is the gzipped --debug output: datasets.log.gz

The download proceeds for a varying amount of time (~two to 39 minutes) and downloads a varying amount of data (haven't kept track but noticed different numbers of GB) before exiting with the error.

I'm using datasets version: 16.17.0

AngieHinrichs avatar May 20 '24 22:05 AngieHinrichs

Earlier today, this command succeeded for me:

datasets download virus genome taxon "Alphainfluenzavirus influenzae" --filename all_alphainfluenza.zip

-- it's the first example command on https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/virus/get-influenza-genomes/ . In 87 minutes, it downloaded a 555MB (530MiB) file that includes data_report.jsonl and genome.fna, but not biosample.jsonl.

Unfortunately the command above with --include genome,biosample has failed twice this afternoon, both times making it to 67.3MB before getting the invalid zip archive error.

AngieHinrichs avatar May 24 '24 00:05 AngieHinrichs

Hi AngieHinrichs,

Thanks for opening the issue. We're looking into it.

Nuala

olearyna avatar May 24 '24 12:05 olearyna

@AngieHinrichs,

Can you run this again with the --debug flag and send us the PHID? - thanks!

olearyna avatar May 24 '24 14:05 olearyna

OK, I am kicking off this command (there's no --no-progress-bar option, so adding a grep -v) and will send PHID and log. Thanks!

time datasets download virus genome taxon 11320 --include genome,biosample --debug |& grep -v ^$'\033' > datasets.log

AngieHinrichs avatar May 24 '24 15:05 AngieHinrichs

OK, PHID is 2F4065564DC261B8F1FA965F. Log attached. datasets.2024-05-24.log.gz

AngieHinrichs avatar May 24 '24 18:05 AngieHinrichs

Hi AngieHinrichs,

We need to take a deeper look at the issue. We'll post her when we have a fix.

Nuala

olearyna avatar May 24 '24 22:05 olearyna

Thanks @olearyna!

AngieHinrichs avatar May 24 '24 22:05 AngieHinrichs

Hi,

Any good news on this? I had the same error since Monday, I though it was something wrong with my code until I read this post.

carolinasisco avatar May 29 '24 21:05 carolinasisco

Hi carolinasisco,

We are actively working on a fix and aim to have it released within the week. We apologize for any inconvenience this may have caused. Thanks for the patience!

Nuala

olearyna avatar May 29 '24 22:05 olearyna

Hi carolinasisco and AngieHinrichs,

We have released a fix in the latest version (v16.18.1) of the command line tool that we believe addresses the reported issues. Please test this update and let us know if you encounter any further errors.

Thanks Nuala

olearyna avatar May 30 '24 18:05 olearyna

Thanks @olearyna, I'll try it out right away!

AngieHinrichs avatar May 30 '24 21:05 AngieHinrichs

It worked and it was much faster than before! Thanks again!

AngieHinrichs avatar May 30 '24 21:05 AngieHinrichs

Great! I'll close this issue.

olearyna avatar May 30 '24 23:05 olearyna

Hi, it did not worked for me, any suggestions? Got the same error

carolinasisco avatar May 31 '24 00:05 carolinasisco

Thanks so much @olearyna and @ericcox1! I just upgraded to 16.18.1 and the first run is optimistic, none of the 4 taxon downloads failed. 🎉

I will comment as soon as I see failures again.

@carolinasisco are you sure you're using version 16.18.1?

I think it would help the devs if you could run with --debug then and share the PHID 😀

corneliusroemer avatar May 31 '24 07:05 corneliusroemer

Hi @carolinasisco,

Yes, if you are still having issues with the latest version can you run --debug and share the PHID. Thanks for the suggestion corneliusroemer!

olearyna avatar May 31 '24 12:05 olearyna

Hi @olearyna

I updated through conda --update, the version showing is 16.18.1, This is my code (I ran it with --debug as suggested):

datasets download gene accession --inputfile ~/Desktop/wp_1_50 --filename wp150 --include gene,protein --debug The error is:

Error: Download error: http2: server sent GOAWAY and closed the connection; LastDownloading: ncbi_dataset.zip 4.62MB error Find attached the screen capture with the phid.

phid

Thanks!

carolinasisco avatar Jun 01 '24 23:06 carolinasisco

Hi carolinasisco,

Thanks for the information! I think this is a separate issue from the virus genome download. We'll look into it tomorrow.

Nuala

olearyna avatar Jun 02 '24 23:06 olearyna

Hi, thank you. I'm trying to download a large set of sequences (nt and aa) from pseudomonas.

carolinasisco avatar Jun 03 '24 00:06 carolinasisco

Hi, I would like to add another example of this error, in hopes of it being helpful in finding a solution. I am using ncbi datasets version 16.31.0. I was trying to download Streptococcus genomic sequences using the following command: datasets download genome taxon Streptococcus --include genome,gbff --reference

This results in the following outcome: Collecting 125 genome records [================================================] 100% 125/125 Downloading: ncbi_dataset.zip 273MB done Validating package files [==>---------------------------------------------] 9% 23/254 Error: Internal error (invalid zip archive). Please try again

On several attempts, the validation of the package files reaches 6 - 9 %.

I reran the command while including either genomes or gbff. When downloading genomes only (--include genome), the process finished successfully. When downloading gbff only (--include gbff) the process failed with the same Internal Error as mentioned above.

mverce avatar Oct 17 '24 13:10 mverce

Hi @mverce,

Thanks for your report.

I wasn't able to reproduce this error and we think you may have encountered a temporary problem.

If you don't mind trying this one more time, please add the --debug flag and report the Ncbi-phid value here so we can investigate further.

datasets download genome taxon Streptococcus --include gbff --reference --filename strep.zip --debug

Best, Eric

ericcox1 avatar Oct 17 '24 17:10 ericcox1

Hi @ericcox1,

I have tried it again with the commands that were problematic yesterday, as well as with your exact command (incl. --filename strep.zip), but the problem persists. The last Ncbi-Phid from the debug output is: 1CA6C01E4134F3592F685054.6.1

Thanks and best regards, Marko

mverce avatar Oct 18 '24 10:10 mverce

I tried the same command as Eric listed and can't reproduce

corneliusroemer avatar Oct 18 '24 17:10 corneliusroemer