foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

Cannot create databases

Open jabard89 opened this issue 3 years ago • 7 comments

Expected Behavior

foldseek databases PDB pdb tmp should setup PDB database

Current Behavior

Returns: gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now

Downloaded pdb.tar.gz is empty. It looks like the target URL (http://wwwuser.gwdg.de/~compbiol/foldseek/) no longer has uploaded databases.

Your Environment

  • Git commit used: 1c40553082f6aab77e17bc6a1a489ce439e3ae9a
  • Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.): foldseek-linux-sse41.tar.gz
  • Operating system and version: Ubuntu 18.04 run on WSL v1, Windows 10

jabard89 avatar Oct 27 '21 14:10 jabard89

@jabard89 we updated the alphabet size of foldseek from 16 to 21. So the old database is not compatible anymore. Therefore I took it down.. We are currently recreating the database. I will let you know once the database is online. But in order to use it you need to update foldseek.

martin-steinegger avatar Oct 27 '21 15:10 martin-steinegger

We reuploaded all databases. Does this work now?

martin-steinegger avatar Mar 07 '22 00:03 martin-steinegger

@martin-steinegger would you be able to provide more information on how I can create the targetdb ? I have a directory that contains a set of protein structures I predicted using Alphafold2 and would like to use these structures to query against the PDB database.

Geraldene avatar Jun 13 '22 10:06 Geraldene

@Geraldene the following command should work.

foldseek easy-search queryFolder pdb aln tmp

martin-steinegger avatar Jun 13 '22 10:06 martin-steinegger

I ran into certificate issues while trying to download, any way to bypass them?:

$ foldseek databases PDB pdb tmp 
databases PDB pdb tmp 

MMseqs Version:              	1.3c64211
Force restart with latest tmp	false
Remove temporary files       	false
Compressed                   	0
Threads                      	12
Verbosity                    	3

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
--2022-07-01 15:13:54--  https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz
Resolving wwwuser.gwdg.de (wwwuser.gwdg.de)... 134.76.10.111
Connecting to wwwuser.gwdg.de (wwwuser.gwdg.de)|134.76.10.111|:443... connected.
ERROR: cannot verify wwwuser.gwdg.de's certificate, issued by ‘CN=Sectigo RSA Organization Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB’:
  Unable to locally verify the issuer's authority.
To connect to wwwuser.gwdg.de insecurely, use `--no-check-certificate'.
Error: Could not download https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz to tmp/14286354622525620261/pdb.tar.gz

kthurimella avatar Jul 01 '22 15:07 kthurimella

We switched the hoster to cloudflare. if you update foldseek it should download it from the new source. I hope it resolves it.

martin-steinegger avatar Jul 01 '22 15:07 martin-steinegger

Thanks for the fast response. I checked out the repo and compiled from the source this time. Now running into this error:

`foldseek databases PDB pdb new_tmp databases PDB pdb new_tmp

MMseqs Version: 5285cd11c335e1a0133ffd3e32f55ad6ff82f3cb Force restart with latest tmp false Remove temporary files false Compressed 0 Threads 12 Verbosity 3

mv: cannot stat 'new_tmp/5610811273439075906/version': No such file or directory`

I initially started with the same tmp folder and then made a new one and I'm wondering if there's a cache that I can clear/force download?

However, the AF databases seem to be downloading!

kthurimella avatar Jul 01 '22 17:07 kthurimella

Does this still persist?

martin-steinegger avatar Jan 24 '23 12:01 martin-steinegger

I've created a Debian source package, that can be built using "simple sid backport" to most Ubuntu versions or Debian: http://sid.ethz.ch/debian/foldseek/

Unfortunately discussion is not activated, but I was wondering if it would make sense to have this as an official package?

Here's my output of OP command:

$ foldseek databases PDB pdb tmp
Create directory tmp
databases PDB pdb tmp 

MMseqs Version:              	GITDIR-NOTFOUND
Tsv                          	false
Force restart with latest tmp	false
Remove temporary files       	false
Compressed                   	0
Threads                      	16
Verbosity                    	3


04/04 10:50:55 [NOTICE] Downloading 1 item(s)
[#63483e 806MiB/872MiB(92%) CN:5 DL:74MiB]                                                          
04/04 10:51:14 [NOTICE] Download complete: tmp/1124933551536758242/pdb.tar.gz

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
63483e|OK  |    64MiB/s|tmp/1124933551536758242/pdb.tar.gz

Status Legend:
(OK):download completed.

04/04 10:51:14 [NOTICE] Downloading 1 item(s)

04/04 10:51:15 [NOTICE] Download complete: tmp/1124933551536758242/version

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
72ca56|OK  |   1.8KiB/s|tmp/1124933551536758242/version

Status Legend:
(OK):download completed.
pdb
pdb_ca
pdb_ca.dbtype
pdb_ca.index
pdb_h
pdb_h.dbtype
pdb_h.index
pdb_mapping
pdb_ss
pdb_ss.dbtype
pdb_ss.index
pdb_taxonomy
pdb.dbtype
pdb.index
pdb.lookup
pdb.md5sum
mvdb tmp/1124933551536758242/pdb pdb 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ss pdb_ss 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_h pdb_h 

Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ca pdb_ca 

Time for processing: 0h 0m 0s 0ms

alexmyczko avatar Apr 04 '23 08:04 alexmyczko

Please make a new issue for this. If you want to make a Debian package, I would recommend to refer the the MMseqs2 debian package: https://salsa.debian.org/med-team/mmseqs2

The maintainers have done a lot of work to make MMseqs2 play well with Debian and a Debian package for Foldseek should be very similar to the MMseqs2 one (just please don't try to separate Foldseek from it's internal MMseqs2 dependency).

milot-mirdita avatar Apr 04 '23 11:04 milot-mirdita