Request on how to generate extra MSA files available from public server vs locally using `colabfold_search`
Hello ColabFold team!,
First, thank you so much for maintaining, updating and creating ColabFold! I really appreciate your team's efforts!
I noticed that runningcolabfold_batch using the public server as MSA source , I see that the output has 3 types of MSA files namely ( as shown below copied from Ref. #580 ) -- heterodimer_2.a3m , pair.a3m and uniref.a3m files.
However, when I use colabfold_search on my locally created database (database was created about ~6 months ago) I only get the MSA file heterodimer_2.a3m. Passing heterodimer_2.a3m file to colabfold_batch doesn't generate any additional MSA files as above.
So my question is --
- are there scripts/ways to get the
pair.a3manduniref.a3mfiles usingcolabfold_searchon my locally created database? - It will be also great if you can elaborate on what are the contents of
pair.a3manduniref.a3mfiles and how they affect the structure prediction accuracy.
Results from using colabfold_batch where the MSAs come from public server
.
├── cite.bibtex
├── config.json
├── log.txt
├── heterodimer_2.a3m
├── heterodimer_2_coverage.png
├── heterodimer_2.done.txt
├── heterodimer_2_env
│ ├── bfd.mgnify30.metaeuk30.smag30.a3m
│ ├── msa.sh
│ ├── out.tar.gz
│ ├── pdb70.m8
│ ├── templates_101
│ │ ├── 7x8v.cif
│ │ ├── pdb70_a3m.ffdata
│ │ ├── pdb70_a3m.ffindex
│ │ ├── pdb70_cs219.ffdata
│ │ └── pdb70_cs219.ffindex -> pdb70_a3m.ffindex
│ ├── templates_102
│ │ ├── 1t1h.cif
│ │ ├── 2c2l.cif
│ │ ├── 2c2v.cif
│ │ ├── 2f42.cif
│ │ ├── 2oxq.cif
│ │ ├── 5olm.cif
│ │ ├── 6fga.cif
│ │ ├── 6s53.cif
│ │ ├── 7bbd.cif
│ │ ├── 7c96.cif
│ │ ├── 8a58.cif
│ │ ├── pdb70_a3m.ffdata
│ │ ├── pdb70_a3m.ffindex
│ │ ├── pdb70_cs219.ffdata
│ │ └── pdb70_cs219.ffindex -> pdb70_a3m.ffindex
│ └── uniref.a3m
├── heterodimer_2_pae.png
├── heterodimer_2_pairgreedy
│ ├── out.tar.gz
│ ├── pair.a3m
│ └── pair.sh
├── heterodimer_2_plddt.png
├── heterodimer_2_predicted_aligned_error_v1.json
├── heterodimer_2_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_relaxed_rank_002_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_relaxed_rank_003_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_relaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_relaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
├── heterodimer_2_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json
├── heterodimer_2_scores_rank_002_alphafold2_multimer_v3_model_3_seed_000.json
├── heterodimer_2_scores_rank_003_alphafold2_multimer_v3_model_5_seed_000.json
├── heterodimer_2_scores_rank_004_alphafold2_multimer_v3_model_2_seed_000.json
├── heterodimer_2_scores_rank_005_alphafold2_multimer_v3_model_4_seed_000.json
├── heterodimer_2_template_domain_names.json
├── heterodimer_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_002_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
└── heterodimer_2_unrelaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
Results from locally created MSA file heterodimer_2.a3m using colabfold_search from local database followed by passing to colabfold_batch
.
├── cite.bibtex
├── config.json
├── log.txt
├── heterodimer_2.a3m
├── heterodimer_2_coverage.png
├── heterodimer_2.done.txt
├── heterodimer_2_pae.png
├── heterodimer_2_plddt.png
├── heterodimer_2_predicted_aligned_error_v1.json
├── heterodimer_2_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_relaxed_rank_002_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_relaxed_rank_003_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_relaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_relaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
├── heterodimer_2_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json
├── heterodimer_2_scores_rank_002_alphafold2_multimer_v3_model_5_seed_000.json
├── heterodimer_2_scores_rank_003_alphafold2_multimer_v3_model_3_seed_000.json
├── heterodimer_2_scores_rank_004_alphafold2_multimer_v3_model_2_seed_000.json
├── heterodimer_2_scores_rank_005_alphafold2_multimer_v3_model_4_seed_000.json
├── heterodimer_2_template_domain_names.json
├── heterodimer_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_002_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_003_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
└── templates
├── 1t1h.cif
├── 2c2l.cif
├── 2c2v.cif
├── 2f42.cif
├── 2oxq.cif
├── 5olm.cif
├── 6fga.cif
├── 6s53.cif
├── 7bbd.cif
├── 7c96.cif
├── 7x8v.cif
├── pdb70_a3m.ffdata
├── pdb70_a3m.ffindex
├── pdb70_cs219.ffdata
└── pdb70_cs219.ffindex
@martin-steinegger and team any clarification will be much appreciated. thanks again
The script used on the server differs slightly from the one used locally with MMseqs2, but if you comment out the following code, then the paired MSA will not be deleted in the local version: https://github.com/sokrypton/ColabFold/blob/main/colabfold/mmseqs/search.py#L532C58-L532C68