Add Chai-1
Description of feature
Add Chai-1 to the pipeline.
Chai-1 is a multi-modal foundation model for molecular structure prediction that performs at the state-of-the-art across a variety of benchmarks. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.
A proof of concept nextflow implementation already exists as nf-chai.
Separately we have a branch on our repo which was created in response to a request to test Chai-1. As of this post, it is capable of running in a barebones state.
The version available on our branch is working but writing out the rest of the boilerplate files is not currently a priority.
If anyone is interested in taking over I have listed the pending tasks below:
Required
- [x] Needs
subworkflows/local/prepare_chai1_dbs.nf - [x] Needs
conf/test_chai1.config - [x] Needs
conf/modules_chai1.config - [ ] Complete the steps on the PR checklist
PR checklist
- [ ] Usage Documentation in
docs/usage.mdis updated. - [ ] Output Documentation in
docs/output.mdis updated. - [ ]
CHANGELOG.mdis updated. - [ ]
README.mdis updated (including new tool citations and authors/contributors). - [x] Make sure your code lints (
nf-core lint). - [x] Ensure the test suite passes (
nextflow run . -profile test,docker --outdir <OUTDIR>). - [x] Check for unexpected warnings in debug mode (
nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
Optional
- [ ] ARM64 incompatible currently due to Gemmi package (only macos arm64, no aarch64)
@jscgh I would love to work on this during the hackathon since we have some GPU availabilities for the duration of the hackathon that I can use to make this work!
I have updated the dockerfile for the Chai-1 image for a ~25% reduction in size. It is available on our fork here.
This won't change functionality at all and should slot right in with whatever other work has been done.
Amazing work @jscgh ! I have to admit I didn't get enough time to finalize a module for this, but have it on my radar to continue this work in the coming weeks. Will ping here with any updates. Smaller containers are definitely a great start!
I've added prepare_chai1_dbs.nf; however, it's not able to download the models yet.
It needs to iterate over components = ["feature_embedding", "token_embedder", "trunk", "diffusion_module", "confidence_head"] to download chai1_models_link"/${component}.pt"
I wasn't able to get that working in nextflow.