foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

How to create a databases?

Open tomato-cmyk opened this issue 2 years ago • 7 comments

Expected Behavior

For foldseek createdb example/ targetDB, what should be the type for "example"? I have lots of pdb files in a folder, and now I want to create a databases containing these pdb files. Should I convert these pdb files into a list or other type of files?

Current Behavior

Steps to Reproduce (for bugs)

Please make sure to execute the reproduction steps with newly recreated and empty tmp folders.

Foldssek Output (for bugs)

Please make sure to also post the complete output of Spacepharer. You can use gist.github.com for large output.

Context

Providing context helps us come up with a solution and improve our documentation for the future.

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute foldseek without any parameters):
  • Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.):
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation:
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory):
  • Operating system and version:

tomato-cmyk avatar Dec 29 '23 16:12 tomato-cmyk

Just replace example with the folder containing pdb files. Foldseek will go through the folder and look for all PDB/mmcif files and turn them into a DB.

milot-mirdita avatar Dec 29 '23 16:12 milot-mirdita

Thanks a lot!

However, I met some new problems. The first one is “No structures found in given input”.

The second one is that “No k-mer could be extracted for the database”

The third one is that “Missing arguments - -score-threshold” ---- Replied Message ----FromMilot @.>Date12/30/2023 00:38 @.> @.>@.>SubjectRe: [steineggerlab/foldseek] How to create a databases? (Issue #223) Just replace example with the folder containing pdb files. Foldseek will go through the folder and look for all PDB/mmcif files and turn them into a DB.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/steineggerlab/foldseek/issues/223#issuecomment-1872208624", "url": "https://github.com/steineggerlab/foldseek/issues/223#issuecomment-1872208624", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

tomato-cmyk avatar Dec 30 '23 15:12 tomato-cmyk

Especially the second one, now it shows "No k-mer could be extracted for the database.......Maybe the sequences length is less than 14 residues."

My code is "foldseek easy-cluster un9.pdb pdb tmp -c 0.9"

tomato-cmyk avatar Dec 30 '23 16:12 tomato-cmyk

I have a question relating to createdb.

When you create a database the <o:sequenceDB> cli parameter appears to be a prefix for all the output file that are dumped in the current directory. this really bloats my directory and I would prefer to have all the files be in their own directory.

How do I get it to dump all the database files into its own directory. When I tried making it a path it failed.

For example: foldseek createdb pdb_dir struct_db_dir/DB_prefix

danny305 avatar Jan 11 '24 23:01 danny305

unpackdb is the module you want.

milot-mirdita avatar Jan 12 '24 05:01 milot-mirdita

Can you provide an example command on how to use unpackdb?

Sorry, still very new to mmseqs2 and foldseek.

danny305 avatar Jan 12 '24 05:01 danny305

I think I misunderstood what you want. Disregard unpackdb.

Just make sure the struct_db_dir exists before calling createdb as it does not try to create parent directories. The command you posted should work, if you call mkdir before.

milot-mirdita avatar Jan 12 '24 06:01 milot-mirdita