proteinfold icon indicating copy to clipboard operation
proteinfold copied to clipboard

Add Chai-1

Open jscgh opened this issue 10 months ago • 5 comments

Description of feature

Add Chai-1 to the pipeline.

Chai-1 is a multi-modal foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of benchmarks. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.

Paper

A proof of concept nextflow implementation already exists as nf-chai.

Separately we have a branch on our repo which was created in response to a request to test Chai-1. As of this post, it is capable of running in a barebones state.

jscgh avatar Feb 28 '25 03:02 jscgh

The version available on our branch is working but writing out the rest of the boilerplate files is not currently a priority.

If anyone is interested in taking over I have listed the pending tasks below:

Required

  • [x] Needs subworkflows/local/prepare_chai1_dbs.nf
  • [x] Needs conf/test_chai1.config
  • [x] Needs conf/modules_chai1.config
  • [ ] Complete the steps on the PR checklist

PR checklist

  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
  • [ ] README.md is updated (including new tool citations and authors/contributors).
  • [x] Make sure your code lints (nf-core lint).
  • [x] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [x] Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).

Optional

  • [ ] ARM64 incompatible currently due to Gemmi package (only macos arm64, no aarch64)

jscgh avatar Mar 13 '25 01:03 jscgh

@jscgh I would love to work on this during the hackathon since we have some GPU availabilities for the duration of the hackathon that I can use to make this work!

FloWuenne avatar Mar 14 '25 13:03 FloWuenne

I have updated the dockerfile for the Chai-1 image for a ~25% reduction in size. It is available on our fork here.

This won't change functionality at all and should slot right in with whatever other work has been done.

jscgh avatar Mar 25 '25 23:03 jscgh

Amazing work @jscgh ! I have to admit I didn't get enough time to finalize a module for this, but have it on my radar to continue this work in the coming weeks. Will ping here with any updates. Smaller containers are definitely a great start!

FloWuenne avatar Mar 26 '25 19:03 FloWuenne

I've added prepare_chai1_dbs.nf; however, it's not able to download the models yet.

It needs to iterate over components = ["feature_embedding", "token_embedder", "trunk", "diffusion_module", "confidence_head"] to download chai1_models_link"/${component}.pt"

I wasn't able to get that working in nextflow.

jscgh avatar May 08 '25 06:05 jscgh