sourmash icon indicating copy to clipboard operation
sourmash copied to clipboard

Genbank downloading problems

Open pandan74 opened this issue 1 year ago • 4 comments

I'm having difficulties to download Genbank databases. I was able to download GTDB, and Genbank viral k31. Is there any other place I can find these files? Please suggest how I can download them. I use curl and usually i get this error: curl: (33) HTTP server doesn't seem to support byte ranges, or just time out.

pandan74 avatar Aug 04 '22 15:08 pandan74

Hi sourmash creators! Thanks a lot for your work! I am having the exact same problem as being described above. Do you have any idea on how to solve it? it seems that the server we are downloading from times out after certain amount of time.

Thanks in advance! Gabri

gabridinosauro avatar Aug 05 '22 20:08 gabridinosauro

hi! sorry about this, it's been difficult to find good places to store these files ;(.

these problems occur when using the dweb.link URLs at https://sourmash.readthedocs.io/en/latest/databases.html, right? If so there are some options discussed here but it is not simple at the moment... I'll see if I can document it more clearly today.

ctb avatar Aug 06 '22 10:08 ctb

yes -- they've been plaguing me for days. another potential alternative -- the OSF links are super fast. are you trying to move away from google drive to OSF for the large files? in the short term can we update the documentation so that it downloads from the links on OSF until we fix the dweb links?

taylorreiter avatar Aug 12 '22 17:08 taylorreiter

some solutions while we don't move everything to R2:

remove https://dweb.link/ipfs/ from download URL

  • for genbank-2022.03-viral-k21.zip: https://dweb.link/ipfs/bafybeicjyx6qkhdtw6q4cxs6fyl46gqfhd4q5eqje5lkswf2npljnyytzi -> bafybeicjyx6qkhdtw6q4cxs6fyl46gqfhd4q5eqje5lkswf2npljnyytzi

with the cloudflare gateway:

  • wget -O genbank-2022.03-viral-k21.zip https://cloudflare-ipfs.com/ipfs/bafybeicjyx6qkhdtw6q4cxs6fyl46gqfhd4q5eqje5lkswf2npljnyytzi

with ipget:

  • grab ipget from https://dist.ipfs.io/#ipget
  • ipget -O genbank-2022.03-viral-k21.zip bafybeicjyx6qkhdtw6q4cxs6fyl46gqfhd4q5eqje5lkswf2npljnyytzi

luizirber avatar Aug 13 '22 17:08 luizirber

Hello! I am pleased to report that our databases may now be Robustly Available via the local UC Davis infrastructure of the dib-lab ;).

Please see #2255 for the PR; you can view the databases file directly here until that PR is merged, at which point it will show up here.

I will update this issue once the PR is merged (which should be fairly quickly).

🎉

ctb avatar Sep 03 '22 16:09 ctb

Merged & docs updated: prepared databases page here now has robustified farm links.

ctb avatar Sep 03 '22 16:09 ctb