IHMCIF icon indicating copy to clipboard operation
IHMCIF copied to clipboard

Referencing multiple sources/datasets for ihm_derived_distance_restaints

Open mtrellet opened this issue 5 years ago • 1 comments

Distances automatically generated in HADDOCK from several sets of "active" (key) residues can currently reference only a unique dataset as a source of information in the IHM dictionary (_ihm_derived_distance_restraint.dataset_list_id). However, HADDOCK does not distinguish whether two key residues come from the same source or from two different ones (e.g. mutagenesis, DNA-footprinting, FRET, etc.).

We should try to find a way to reference multiple experiments when defining a distance restraint.

Two solutions were proposed for the moment:

  1. Be able to reference the _ihm_dataset_group.group_id item that is supposed to group datasets used for the same modeling protocol (and referenced in _ihm_modeling_protocol.dataset_group_id)
  2. Allows for comma-separated list of values in _ihm_derived_distance_restraint.dataset_list_id item.

mtrellet avatar Feb 18 '19 09:02 mtrellet

As a side note and following the last discussions we/I had where:

  1. A decision was taken to create a category referencing the interface_residues that would link poly_residue_features to dataset_list_id and/or dataset_group_id
  2. A realisation with Alexandre that, in order for HADDOCK to be able to exactly reproduce a given set of parameters for a HADDOCK run (and then reproduce it only from the mmCIF file information), we should try to integrate the importance/status of a given interface residue (e.g. either active or passive). This could take the form of a boolean flag for instance. For the sake of completeness, a short reminder: Active residues are residues that should be at the interface (significant penalty if they are not) and passive residues are residues that can be at the interface (no penalty if not at the interface but favorable score if they are).

mtrellet avatar Feb 25 '19 11:02 mtrellet