foldcomp icon indicating copy to clipboard operation
foldcomp copied to clipboard

Subsetting databases

Open patrickbryant1 opened this issue 10 months ago • 7 comments

Hi,

Thank you for the great resource!

I am having trouble subsetting databases and decompressing subsets of the databases you provide here: https://foldcomp.steineggerlab.workers.dev

According to the instructions, I should be able to decompress a subset of a database given an "id_list.txt".

This is how I do it for e.g. A. thaliana:

head -n 1 data/a_thaliana.lookup 0 AF-A0A178UFC4-F1-model_v4.pdb 0

As I understand it, the ID here is "AF-A0A178UFC4-F1-model_v4".

Now, I write this into a file called id_list.txt, then I run the command: foldcomp decompress --id-list id_list.txt data/a_thaliana

with the response: Decompressing files in data/a_thaliana using 1 threads Output directory: data/a_thaliana_pdb/ [Warning] AF-A0A178UFC4-F1-model_v4 not found in database.

I have tried many different ways of naming the ids based on what is in a_thaliana.lookup, but nothing seems to work. The same using mmseqs to subset the database: """ createsubdb --subdb-mode 0 --id-mode 1 id_list.txt a_thaliana test_sel/output_foldcomp_db

MMseqs Version: ad6dfc66d7bbc4fd626fc19adf10ba587bc137c4 Subdb mode 0 Database ID mode 1 Verbosity 3

Could not find name AF-A0A178UFC4-F1-model_v4 in lookup Time for merging to output_foldcomp_db: 0h 0m 0s 1ms Time for processing: 0h 0m 0s 34ms """

Can you please explain what I am doing wrong and how to properly specify the IDs?

Best,

Patrick

patrickbryant1 avatar Aug 09 '23 07:08 patrickbryant1