proteinfold
proteinfold copied to clipboard
Improved support for non-protein entities
Description of feature
AlphaFold3, RoseTTAFold-All-Atom, Boltz and HelixFold3 all have the ability to model non-protein entities.
AlphaFold3 currently only supports protein monomers.
Boltz, RoseTTAFold-All-Atom and HelixFold3 support non-protein entities via mode-specific file formats but this does not allow multiple modes to be run simultaneously.
Currently, boltz is the only mode to support non-protein entities via FASTA format.
It is supported by designating the entity type (protein, rna, dna, smiles, ccd) in the fasta header (eg >A|protein) and guessing molecule type as a fallback (implemented here).
General support for non-protein entities could be provided by:
- Adopting the current boltz implementation:
- overloading FASTA header
- fallback to type guessing
OR
- Modifying the samplesheet schema to contain enttity-level fields which could be assembled to module-specific formats using a proteinfold utility:
- protein_fasta
- rna_fasta
- dna_fasta
- smiles_fasta
- ccd_fasta