Constraint Ligand docking

Open Manjubn777 opened this issue 7 months ago • 1 comments

Hello Team,

I am currently performing large-scale docking of millions of ligands, with the goal of implementing constraint-based docking using Rosetta. However, I am facing 2 challenges:

Ligand Preparation: I need to convert millions of SMILES entries into 3D structures (SDF/PDB formats) and generate the corresponding parameter files (.params) required by Rosetta. Creating .params files for each ligand can be overwhelming. Are there more efficient or automated approaches available?

I would like to know if you have specific protocol for preparing ligands from smiles to the format required by Rosetta.

Constraint Implementation: I have read that Rosetta's constraint formats relies on residue numbers and chain IDs. Since small molecules typically lack these identifiers, I am unsure how to define constraints for them within Rosetta Ligand. Could you provide more details on preparing constraint file for ligand protein docking.

If there are any supporting tutorials that could help me achieve this complete flow(simple) I would appreciate it if you could share them at your earliest convenience. Thanks !

May 21 '25 11:05 Manjubn777

There's not currently a way to use a SMILES designation to specify a ligand. However, there is a way to specify an SDF, and allow the internal parameterization of Rosetta to basically run molfile_to_params.py for you -- simply use -extra_res_mol ligand.sdf instead of -extra_res_fa ligand.params. The caveat is that SDF files don't have atom name information, so there may be an issue matching the atoms in the params file to those in the PDB file, resulting in poor representation. (Which is why it's not generally recommended at the moment.) But if you're directly redocking anyway, it may not make all that much difference. -- Note that this is only applicable at the moment for default parameterization (e.g. for RosettaLigand docking). The parameterization for GenPot (for GALigandDocking) is not integrated with C++ Rosetta, and must be done with the Python scripts.

Generally, with ligand docking you need to provide an input PDB with the ligand already present. As such, you can assign it whatever chain letter and residue number you like. For RosettaLigand docking, this is conventionally 1 X. It's part of your input pipeline

If you're doing high throughput docking, I'd take a look at the DeLuca, Khar & Meiler paper https://doi.org/10.1371/journal.pone.0132508 There's some scripts associated with that. There's been some tutorials in the Meiler lab workshop (https://meilerlab.org/tutorials/) that include those convenience scripts. I think the most recent has been the 2018 Protein-small molecule docking tutorial.

May 29 '25 21:05 roccomoretti