imp icon indicating copy to clipboard operation
imp copied to clipboard

Support mmCIF output when not using PMI

Open benmwebb opened this issue 8 years ago • 2 comments

IMP.pmi currently supports mmCIF output, but it is strongly tied to PMI1's representation. We would also like to be able to output files in mmCIF format using PMI2, or even not using PMI at all. In order to accomplish this, much of the metadata currently tracked by PMI or the mmCIF support code itself (ProtocolOutput class) will need to be stored in the IMP Model itself (as decorated particles, most likely). This includes:

  • Provenance information for hierarchies (e.g. "this fragment is derived from PDB 1abc")
  • Provenance for restraints (e.g. "this restraint uses EMDB entry EM-123")
  • Potentially information on the simulation protocol, clustering, etc. (depending on how multiple models are represented)

mmCIF files can contain multiple coordinate sets ('frames'), but do not support addition of frames to an existing file (since the data is grouped by category, not frame). Multi-frame output can be handled in a number of different ways:

  • Store all frames in memory and then write them all out together (this is what is currently done with PMI, but won't scale)
  • Write each individual frame to an intermediate file (either mmCIF or another format that can represent the same data), then combine all files at the end
  • Append each frame to an intermediate file that does support appending frames (such as RMF) then convert the RMF to mmCIF at the end
  • Write an mmCIF file containing just frame 1, then when writing frame 2 read the mmCIF file back in, add the frame, and then write out a new mmCIF

benmwebb avatar Jan 10 '17 18:01 benmwebb

It is worth looking into MMTF as another format: https://mmtf.rcsb.org/ much more compact and on the way to wide adoption.

duhovka avatar Jan 11 '17 08:01 duhovka

@duhovka @brindakv Thanks, yes, we've been looking at MMTF with the RCSB folks as a possible trajectory format. (MMTF doesn't currently support our coarse-grained models, but they are interested in adding such support.) Unfortunately Chimera support for MMTF is still rather rudimentary (their reader is super-slow) but that can change (replacement of the Python reader with C++ for example) if MMTF takes off.

benmwebb avatar Jan 11 '17 15:01 benmwebb

The IMP.mmcif module should now be able to generate a fully-compliant IHM mmCIF or BinaryCIF file from an IMP Model (or RMF file) that contains suitable provenance decorators. These do not require PMI.

benmwebb avatar Oct 05 '23 19:10 benmwebb