ncbi-genome-download
ncbi-genome-download copied to clipboard
Connection Error, RemoteDisconnected
Hi,
When I try to download data for a relatively large number of genomes, e.g.:
ncbi-genome-download bacteria -t 562 -l complete -F assembly-report
I get the following error message: ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))
I don't get this issue when downloading only one or a few genomes. Looking at similar issues it seems that a Connection Error is usually due to the connection of users themselves, and not an error caused by ncbi-genome-download. However because the connection is closed by the remote end, I'm not sure.
If anyone could help me out that'd be greatly appreciated!
Best, Lisa
Same problem (04/10/2021):
ncbi-genome-download --formats fasta bacteria --parallel 4 WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_011742285.2' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815795.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815575.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_009498175.3' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815655.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815675.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815835.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017869345.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815595.1' WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_017815615.1' ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')))
Similar problem ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', OSError(0, 'Error')))
I've also been having the same issue for a week:
ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))
Hm, it looks like NCBI might have introduced some kind of connection limit. I'm not aware of any documentation on this from the NCBI side of things, and unlike with the Entrez API, there's not really a way to provide e.g. an API key to get a less strict rate limit. I'll try if I can reproduce and debug this a bit further.
Ok, looks like I'm getting the ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', OSError(0, 'Error')),)
one myself here. I'll see if I can find out what's happening.
Now I got the ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
one. Unfortunately there's really not much to find out about this, because the connection is closed not at the HTTP GET request level but one level below that, so there's really no communication of what the issue is.
I'm currently trying to add a rate limiting step to see if that fixes it, but this will slow down things considerably.
I met the same bug, and I am looking forward to your solution.
ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)),)
Nope, still happens, even at just 1 request per second, it just takes longer to get there. As this already happens at the stage of downloading the checksum files, you can't even cache these and restart easily, so I'm also struggling to find a good workaround.
Having said that, I hear from a couple of colleagues that also other connections to the NCBI FTP servers die with the same issues, regardless of if the HTTPS protocol is being used (like for ncbi-genome-download
) or if old-fashioned FTP is being used. So maybe there's just some networking issues at the NCBI side of things at the moment?
Thank you for looking into the issue! Let's hope it's only a temporary NCBI connection problem.
Having said that, I hear from a couple of colleagues that also other connections to the NCBI FTP servers die with the same issues, regardless of if the HTTPS protocol is being used (like for
ncbi-genome-download
) or if old-fashioned FTP is being used. So maybe there's just some networking issues at the NCBI side of things at the moment?
I can attest to even pure FTP downloads getting cut off more or less randomly, regardless of the protocol used, going into June 9th 2021. It looks like NCBI introduced some sort of arbitrary cutoff for shutting down connections. One would wonder if they can't just communicate with the research community directly on what's needed...
Hi, still an issue today
This is on the NCBI side of things, though. Not much we can do about this on the client side.
Same problem - Then thinking that it would be nice with a resume command - not sure if when we relaunch everything starts from scratch then, but the ability to resume would be perfect then.
ncbi-genome-download
doesn't re-download files that are correctly downloaded and current. But in order to check that, it does need to fetch all checksum files again on startup, and if you're downloading a lot of records that can also take a while.
Hitting this today
NLM had a bunch of website issues a couple of days ago maybe also something going on with the FTP
ERROR: Download from NCBI failed: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))
Again, this is an issue on the NCBI side, nothing ncbi-genome-download
can do about it.