foldseek
foldseek copied to clipboard
Cannot create databases
Expected Behavior
foldseek databases PDB pdb tmp
should setup PDB database
Current Behavior
Returns: gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now
Downloaded pdb.tar.gz is empty. It looks like the target URL (http://wwwuser.gwdg.de/~compbiol/foldseek/) no longer has uploaded databases.
Your Environment
- Git commit used: 1c40553082f6aab77e17bc6a1a489ce439e3ae9a
- Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.): foldseek-linux-sse41.tar.gz
- Operating system and version: Ubuntu 18.04 run on WSL v1, Windows 10
@jabard89 we updated the alphabet size of foldseek from 16 to 21. So the old database is not compatible anymore. Therefore I took it down.. We are currently recreating the database. I will let you know once the database is online. But in order to use it you need to update foldseek.
We reuploaded all databases. Does this work now?
@martin-steinegger would you be able to provide more information on how I can create the targetdb ? I have a directory that contains a set of protein structures I predicted using Alphafold2 and would like to use these structures to query against the PDB database.
@Geraldene the following command should work.
foldseek easy-search queryFolder pdb aln tmp
I ran into certificate issues while trying to download, any way to bypass them?:
$ foldseek databases PDB pdb tmp
databases PDB pdb tmp
MMseqs Version: 1.3c64211
Force restart with latest tmp false
Remove temporary files false
Compressed 0
Threads 12
Verbosity 3
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (60) SSL certificate problem: unable to get local issuer certificate
More details here: https://curl.haxx.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
--2022-07-01 15:13:54-- https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz
Resolving wwwuser.gwdg.de (wwwuser.gwdg.de)... 134.76.10.111
Connecting to wwwuser.gwdg.de (wwwuser.gwdg.de)|134.76.10.111|:443... connected.
ERROR: cannot verify wwwuser.gwdg.de's certificate, issued by ‘CN=Sectigo RSA Organization Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater Manchester,C=GB’:
Unable to locally verify the issuer's authority.
To connect to wwwuser.gwdg.de insecurely, use `--no-check-certificate'.
Error: Could not download https://wwwuser.gwdg.de/~compbiol/foldseek/pdb.tar.gz to tmp/14286354622525620261/pdb.tar.gz
We switched the hoster to cloudflare. if you update foldseek it should download it from the new source. I hope it resolves it.
Thanks for the fast response. I checked out the repo and compiled from the source this time. Now running into this error:
`foldseek databases PDB pdb new_tmp databases PDB pdb new_tmp
MMseqs Version: 5285cd11c335e1a0133ffd3e32f55ad6ff82f3cb Force restart with latest tmp false Remove temporary files false Compressed 0 Threads 12 Verbosity 3
mv: cannot stat 'new_tmp/5610811273439075906/version': No such file or directory`
I initially started with the same tmp folder and then made a new one and I'm wondering if there's a cache that I can clear/force download?
However, the AF databases seem to be downloading!
Does this still persist?
I've created a Debian source package, that can be built using "simple sid backport" to most Ubuntu versions or Debian: http://sid.ethz.ch/debian/foldseek/
Unfortunately discussion is not activated, but I was wondering if it would make sense to have this as an official package?
Here's my output of OP command:
$ foldseek databases PDB pdb tmp
Create directory tmp
databases PDB pdb tmp
MMseqs Version: GITDIR-NOTFOUND
Tsv false
Force restart with latest tmp false
Remove temporary files false
Compressed 0
Threads 16
Verbosity 3
04/04 10:50:55 [NOTICE] Downloading 1 item(s)
[#63483e 806MiB/872MiB(92%) CN:5 DL:74MiB]
04/04 10:51:14 [NOTICE] Download complete: tmp/1124933551536758242/pdb.tar.gz
Download Results:
gid |stat|avg speed |path/URI
======+====+===========+=======================================================
63483e|OK | 64MiB/s|tmp/1124933551536758242/pdb.tar.gz
Status Legend:
(OK):download completed.
04/04 10:51:14 [NOTICE] Downloading 1 item(s)
04/04 10:51:15 [NOTICE] Download complete: tmp/1124933551536758242/version
Download Results:
gid |stat|avg speed |path/URI
======+====+===========+=======================================================
72ca56|OK | 1.8KiB/s|tmp/1124933551536758242/version
Status Legend:
(OK):download completed.
pdb
pdb_ca
pdb_ca.dbtype
pdb_ca.index
pdb_h
pdb_h.dbtype
pdb_h.index
pdb_mapping
pdb_ss
pdb_ss.dbtype
pdb_ss.index
pdb_taxonomy
pdb.dbtype
pdb.index
pdb.lookup
pdb.md5sum
mvdb tmp/1124933551536758242/pdb pdb
Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ss pdb_ss
Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_h pdb_h
Time for processing: 0h 0m 0s 0ms
mvdb tmp/1124933551536758242/pdb_ca pdb_ca
Time for processing: 0h 0m 0s 0ms
Please make a new issue for this. If you want to make a Debian package, I would recommend to refer the the MMseqs2 debian package: https://salsa.debian.org/med-team/mmseqs2
The maintainers have done a lot of work to make MMseqs2 play well with Debian and a Debian package for Foldseek should be very similar to the MMseqs2 one (just please don't try to separate Foldseek from it's internal MMseqs2 dependency).