ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Request on how to generate extra MSA files available from public server vs locally using `colabfold_search`

Open punit-jha123 opened this issue 8 months ago • 2 comments

Hello ColabFold team!,

First, thank you so much for maintaining, updating and creating ColabFold! I really appreciate your team's efforts!

I noticed that runningcolabfold_batch using the public server as MSA source , I see that the output has 3 types of MSA files namely ( as shown below copied from Ref. #580 ) -- heterodimer_2.a3m , pair.a3m and uniref.a3m files.

However, when I use colabfold_search on my locally created database (database was created about ~6 months ago) I only get the MSA file heterodimer_2.a3m. Passing heterodimer_2.a3m file to colabfold_batch doesn't generate any additional MSA files as above.

So my question is --

  • are there scripts/ways to get the pair.a3m and uniref.a3m files using colabfold_search on my locally created database?
  • It will be also great if you can elaborate on what are the contents of pair.a3m and uniref.a3m files and how they affect the structure prediction accuracy.

Results from using colabfold_batch where the MSAs come from public server

.
├── cite.bibtex
├── config.json
├── log.txt
├── heterodimer_2.a3m
├── heterodimer_2_coverage.png
├── heterodimer_2.done.txt
├── heterodimer_2_env
│   ├── bfd.mgnify30.metaeuk30.smag30.a3m
│   ├── msa.sh
│   ├── out.tar.gz
│   ├── pdb70.m8
│   ├── templates_101
│   │   ├── 7x8v.cif
│   │   ├── pdb70_a3m.ffdata
│   │   ├── pdb70_a3m.ffindex
│   │   ├── pdb70_cs219.ffdata
│   │   └── pdb70_cs219.ffindex -> pdb70_a3m.ffindex
│   ├── templates_102
│   │   ├── 1t1h.cif
│   │   ├── 2c2l.cif
│   │   ├── 2c2v.cif
│   │   ├── 2f42.cif
│   │   ├── 2oxq.cif
│   │   ├── 5olm.cif
│   │   ├── 6fga.cif
│   │   ├── 6s53.cif
│   │   ├── 7bbd.cif
│   │   ├── 7c96.cif
│   │   ├── 8a58.cif
│   │   ├── pdb70_a3m.ffdata
│   │   ├── pdb70_a3m.ffindex
│   │   ├── pdb70_cs219.ffdata
│   │   └── pdb70_cs219.ffindex -> pdb70_a3m.ffindex
│   └── uniref.a3m
├── heterodimer_2_pae.png
├── heterodimer_2_pairgreedy
│   ├── out.tar.gz
│   ├── pair.a3m
│   └── pair.sh
├── heterodimer_2_plddt.png
├── heterodimer_2_predicted_aligned_error_v1.json
├── heterodimer_2_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_relaxed_rank_002_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_relaxed_rank_003_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_relaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_relaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
├── heterodimer_2_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json
├── heterodimer_2_scores_rank_002_alphafold2_multimer_v3_model_3_seed_000.json
├── heterodimer_2_scores_rank_003_alphafold2_multimer_v3_model_5_seed_000.json
├── heterodimer_2_scores_rank_004_alphafold2_multimer_v3_model_2_seed_000.json
├── heterodimer_2_scores_rank_005_alphafold2_multimer_v3_model_4_seed_000.json
├── heterodimer_2_template_domain_names.json
├── heterodimer_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_002_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_003_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
└── heterodimer_2_unrelaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb

Results from locally created MSA file heterodimer_2.a3m using colabfold_search from local database followed by passing to colabfold_batch

.
├── cite.bibtex
├── config.json
├── log.txt
├── heterodimer_2.a3m
├── heterodimer_2_coverage.png
├── heterodimer_2.done.txt
├── heterodimer_2_pae.png
├── heterodimer_2_plddt.png
├── heterodimer_2_predicted_aligned_error_v1.json
├── heterodimer_2_relaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_relaxed_rank_002_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_relaxed_rank_003_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_relaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_relaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
├── heterodimer_2_scores_rank_001_alphafold2_multimer_v3_model_1_seed_000.json
├── heterodimer_2_scores_rank_002_alphafold2_multimer_v3_model_5_seed_000.json
├── heterodimer_2_scores_rank_003_alphafold2_multimer_v3_model_3_seed_000.json
├── heterodimer_2_scores_rank_004_alphafold2_multimer_v3_model_2_seed_000.json
├── heterodimer_2_scores_rank_005_alphafold2_multimer_v3_model_4_seed_000.json
├── heterodimer_2_template_domain_names.json
├── heterodimer_2_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_002_alphafold2_multimer_v3_model_5_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_003_alphafold2_multimer_v3_model_3_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_004_alphafold2_multimer_v3_model_2_seed_000.pdb
├── heterodimer_2_unrelaxed_rank_005_alphafold2_multimer_v3_model_4_seed_000.pdb
└── templates
    ├── 1t1h.cif
    ├── 2c2l.cif
    ├── 2c2v.cif
    ├── 2f42.cif
    ├── 2oxq.cif
    ├── 5olm.cif
    ├── 6fga.cif
    ├── 6s53.cif
    ├── 7bbd.cif
    ├── 7c96.cif
    ├── 7x8v.cif
    ├── pdb70_a3m.ffdata
    ├── pdb70_a3m.ffindex
    ├── pdb70_cs219.ffdata
    └── pdb70_cs219.ffindex

punit-jha123 avatar Apr 16 '25 18:04 punit-jha123

@martin-steinegger and team any clarification will be much appreciated. thanks again

punit-jha123 avatar May 11 '25 07:05 punit-jha123

The script used on the server differs slightly from the one used locally with MMseqs2, but if you comment out the following code, then the paired MSA will not be deleted in the local version: https://github.com/sokrypton/ColabFold/blob/main/colabfold/mmseqs/search.py#L532C58-L532C68

martin-steinegger avatar May 11 '25 07:05 martin-steinegger