Using Modelforge -RF3 to test protein folding of sequences containing D-residues in main chain.
The "Training RF3" section of the prerprint suggests that better handling of chirality with this new framework is possible.
Is it possible to provide a single chain sequence that contains both L and D amino acids to predict folding? For a seqence containing multiple D-residues, would each D-amino acid need to be treated as an individual NCAA that must be specified with a corresponding SMILES, following the format in your NCAA example?
Thank you,
You should be able to use CCD codes in the seq: lines if you enclose them in parenthesis -- See, for example, the 3en2_from_json_with_msa.json example. I believe that should work with the D-amino acid CCD codes.
For the example_with_ncaa.json, it's the (PBF) in the seq line that specifies the NCAAs. The O=C1Nc2ccccc2[C@]34OCC[C@H]3C[C@H]14 in the smiles entry is a ligand, rather than the PBF p-Benzoyl-L-phenylalanine NCAA.
So if a residue already has a defined CCD code, I should be good to go out of the box, without having to provide any additional stereochemical description of the residue?
That is correct, if it has a defined CCD code, we will load the appropriate streochemistry
Apologies for the naive question, but how do I initialize modelforge after installation? I was able to install and successfully run the 5vht example a few days ago. But now when I try to run "rf3 fold" from the modelforge directory it says command not found.
You'll have to re-activate the virtual environment. That's just the source .venv/bin/activate stage of the installation instructions.
I think that uv can also do the activation automatically if you use uv run. (e.g. uv run rf3 fold ...) -- though if you're running a number of commands the manual activation may work easier.
I was able to get the uv run to wrk but not the source .venv/bin/activate Thank you!
I've gotten this error trying to run a sequence with a single D-amino acid (D-Tyrosine). I used the (DTY) as the three letter code, https://www.ebi.ac.uk/pdbe-srv/pdbechem/chemicalCompound/show/DTY. I don't get the error when I use the single letter "Y" in my sequence. Is there are a library/config file that I can look up to make sure I'm using the right names for the input?
InstantiationException: Error in call to target 'modelhub.inference_engines.rf3.RF3InferenceEngine': ValueError("Invalid chain_type=<ChainType.POLYPEPTIDE_L: 6> for chem_comp_types={np.str_('L-PEPTIDE LINKING'), np.str_('PEPTIDE LINKING'), np.str_('D-PEPTIDE LINKING')}. Valid are valid_chem_comp_types=frozenset({'L-PEPTIDE COOH CARBOXY TERMINUS', 'L-PEPTIDE NH3 AMINO TERMINUS', 'PEPTIDE LINKING', 'L-GAMMA-PEPTIDE, C-DELTA LINKING', 'L-BETA-PEPTIDE, C-GAMMA LINKING', 'L-PEPTIDE LINKING'})")