Recreate MSA Server Outputs Locally
Hi ColabFold team,
I'm trying to reproduce ColabFold server MSA results locally for multimer/complex searches, but I'm getting different outputs despite attempting to match the server configuration. I'd appreciate clarification on the exact workflow and parameters used by the server. What I'm trying to achieve: Reproduce the exact MSA search results from the ColabFold server locally for multimer protein complexes. What I've tried:
- use the specific MMSeqs 18 release
8cc5ce3 - Referenced the config.json
- Tested various colabfold_search parameter combinations
prefilter-modeuse-env-pairing(pointing to the uniref db) both on CPU and GPU
Specific questions:
-
For multimer searches with
"pair_mode": "unpaired_paired", which exact script is used on the server? Is it the pairing script (second one in the backend)? What are the values for PAIRING_FILTER and PAIRING_FILTER_PROX parameters? -
What databases are actually used for pairing? Does the server use the same colabfold_envdb_202108_db for both regular environmental search and then uniref for the pairing?
-
Are there any undocumented parameters or preprocessing steps that differ between the server and local colabfold_search? Could you provide the exact command-line equivalent that would reproduce the server's multimer search behavior?
Would greatly appreciate any guidance on matching the server's exact workflow locally. This would help ensure reproducibility for our large upcoming runs. @milot-mirdita @martin-steinegger Thank you!
Hi, sorry for the multiple messages but I have gotten the MSA server (in MsaServer/) up and running and even after letting this set up as specified by the setup-and-start-local.sh I am still getting different results compared to the API calls to your hosted service. I am using the CPU version of the MSA server, do I need to use the GPU version to match the MSA completely? If there is any advice that would be great. Thanks!
@KPHippe I have also noticed a diff between running Colabfold locally vs through the API. Have you been able to identify the reason for this?