GTDBTk icon indicating copy to clipboard operation
GTDBTk copied to clipboard

Slow download speed v214

Open kafker opened this issue 2 years ago • 22 comments
trafficstars

GitHub issues are specifically for issues with the GTDB-Tk, please join us on the GTDB forum:

Dear devs,

not sure if this was caused by a network problem on my side or your side. I am trying to download the latest GTDB database:

wget https://data.gtdb.ecogenomic.org/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

However, after 10 min or so the download drops to 100k per sec, making it impossible to download the database in a reasonable amount of time.

I tried different wireless connections (HPC or home) but nothing seems to work.

Thank you! K

kafker avatar Jun 02 '23 18:06 kafker

Same issue here. I have relatively high speed internet at home. Still, the download rate of GTDB db barely exceeds 200 kb/s.

konstantin-demin avatar Jun 03 '23 10:06 konstantin-demin

Hello,

Thank you for raising this issue, I'll take a look into this.

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

I'm also in the process of applying for an additional quota to use Zenodo as a secondary mirror.

Cheers, Aaron

aaronmussig avatar Jun 04 '23 04:06 aaronmussig

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

Hi Aron,

The download from the mirror is much more stable.

The download speed was 4-7 MB/s

Thank you! K

kafker avatar Jun 04 '23 17:06 kafker

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

Hello Aron. The link you provided reaches the same speed as before, ~200 kb/s. But I was managed to download the db by switching to Windows and directly downloading it from the latest link in the list of releases here https://ecogenomics.github.io/GTDBTk/installing/index.html. From windows, the speed was 4-10 mb/s. I don't really know if the problem is in automatic download or in my Linux machine (facing no problems with any other downloads of any other thing anyway).

I think additional mirror wouldn't be bad.

Thanks for help!

konstantin-demin avatar Jun 04 '23 21:06 konstantin-demin

Hello,

Thank you for raising this issue, I'll take a look into this.

Can you both let me know what speed you get when trying to download from the mirror? https://data.ace.uq.edu.au/public/gtdb/data/releases/release214/214.0/auxillary_files/gtdbtk_r214_data.tar.gz

I'm also in the process of applying for an additional quota to use Zenodo as a secondary mirror.

Cheers, Aaron

I get 0.5-5 mb/s using the above link. That's about 10-20x faster than normal...

Sumsarium avatar Jun 15 '23 13:06 Sumsarium

Sorry to hear about the slow speeds, I am still waiting on Zenodo to get back to me about additional storage.

In the meantime, I've developed a small program that will download the GTDB-Tk R214 reference database from the unarchived data. It's fault tolerant and will allow you to download with multiple threads.

If anyone who is experiencing slow download speeds would like to give it a go, please see: https://github.com/Ecogenomics/gtdbtk-db-download

I've got a few ideas that would be a bit more involved in speeding it up, i.e. namely downloading the fasta files from NCBI, but I'll only do that if this is still unusable.

aaronmussig avatar Jun 16 '23 04:06 aaronmussig

impossible to download the R214 database, it's too slow (20kb/sec)..... same for the mirror

ValentinCledassou avatar Aug 25 '23 08:08 ValentinCledassou

I tested the download speed from Denmark and Australia and the download speed was at ~7MB/s. Nevertheless I rebooted NGINX, did it help?

aaronmussig avatar Aug 25 '23 09:08 aaronmussig

With a VPN for Australia, I have the same speed that you. But without Vpn (in France) it's always ~20kb/sec

ValentinCledassou avatar Aug 25 '23 10:08 ValentinCledassou

Mine starts at 8 mb/s but quickly drops down to around 300-500 kb/s. Generally seems to be a bit unstable wrt speed. I haven´t tested it via VPN. Not a big issue (for me at least) as long as the databases aren´t updated on a weekly basis...

Sumsarium avatar Aug 25 '23 12:08 Sumsarium

Hi @aaronmussig,

any news on this? The download speed from Germany is super slow, like 200 kb/s.

Cheers Bastian

bheimbu avatar Apr 05 '24 10:04 bheimbu

Hi,

Is there a solution for this issue when downloading r220? I've noticed that my download of the new release oscillates between 10 - 60 KB/s, and our IT department confirmed that it's not an issue from our side.

Thanks!

iwilkie avatar May 13 '24 07:05 iwilkie

This seems to be a persistent issue. It still takes me several days to download the databases (Denmark).

Sumsarium avatar May 13 '24 07:05 Sumsarium

It still takes me several days to download the databases (Denmark)

I'm in Germany and my download has been going for 10 days now... Have you tried using the VPN to Australia? Unfortunately I cannot test this from my work setup, but I wanted to give it a try when I get back to my personal computer.

iwilkie avatar May 13 '24 07:05 iwilkie

Hello!! I am based in Germany and facing the same issue when downloading R220, the mirror didn't improve anything and the speed is fluctuating between 20 and 300 KB/s (mostly in the lower range). Did anyone find a solution to this?? (Downloading to windows is not an option)

marianamnoriega avatar Jun 10 '24 16:06 marianamnoriega

@marianamnoriega are you getting this with the mirror too? I just downloaded the other day (US) and it was pretty fast.

jolespin avatar Jun 19 '24 15:06 jolespin

Hi ! Thanks for this resource, I'm facing the same issue.

@marianamnoriega are you getting this with the mirror too? I just downloaded the other day (US) and it was pretty fast.

@jolespin , in Germany, same rate (~50 kb/s) with either the primary or mirror URL. This is the case with wget and with a browser.

Let me know how I can help you help us! Best regards,

cpauvert avatar Jun 21 '24 14:06 cpauvert

heya folks!

I had trouble with this for a while too, especially because i'm often installing gtdbtk on multiple systems. In the US, the mirror has worked well for me, but i see from this thread here that's not always the case for all :/

I don't know how well google drive downloads work from different places around the world, but if anyone would like to have somewhere else to try to pull from, i've put R220 up there (as downloaded from here on 21-Jun-2024).

You can grab it from the google drive directly from here: https://drive.google.com/drive/u/0/folders/1YOtMHILvs3xS9cZ2CjW7n20myZYDYVh6

Or if you need to pull it directly to a remote machine, it's more difficult to download from google drive programmatically than it should be, but lately i've had luck with gdrive3 – I'm using the latest 3.9.1 at the time of posting this. After installing and setting that up, it could be grabbed with the following:

gdrive files download 16qqRgrlb0Xwip_fvhXQgXGlPcZoab3UC

AstrobioMike avatar Jun 24 '24 01:06 AstrobioMike

Thanks @AstrobioMike for this effort for the community, much appreciated! It is now downloading ~10MB/s (estimated myself as there is currently no rate displayed glotlabs/gdrive#44) on a remote.

cpauvert avatar Jun 25 '24 09:06 cpauvert

is there any mirror for china, download database in china is impossible for now(<1kb/s).

bayegy avatar Aug 07 '24 08:08 bayegy