hPDB icon indicating copy to clipboard operation
hPDB copied to clipboard

PDB parser in Haskell

hPDB

Haskell PDB file format parser.

Build Status Hackage Hackage Dependencies Join the chat at https://gitter.im/BioHaskell/hPDB

Protein Data Bank file format is a most popular format for holding biomolecule data.

This is a very fast parser:

  • below 7s for the largest entry in PDB - 1HTQ which is over 70MB
  • as compared with 11s of RASMOL 2.7.5,
  • or 2m15s of BioPython with Python 2.6 interpreter.

It is aimed to not only deliver event-based interface, but also a high-level data structure for manipulating data in spirit of BioPython's PDB parser.

Details on official releases are on Hackage

This package is also a part of Stackage - a stable subset of Hackage.

Projects for the future:

Please let me know if you would be willing to push the project further.

In particular one may considering these features:

  • Implement basic spatial operations of RMS superposition (with SVD), affine transform on a substructure.
  • Use lens to facilitate access to the data structures.
    • torsion angles within protein/RNA chain.
  • Add Octree to the default data structure (with automatic update.)
  • Migrate out of text-format, since it gives portability trouble, and slows things down when printing.
  • Write a combinator library for generic fast parsing.
  • Checking whether GHC 7.8 improved efficiency of fixed point arithmetic, since PDB coordinates have dynamic range of just ~2^20 bits, with smallest step of 0.001.
  • Class-based wrappers showing Structure-Model-Chain-Residue-Atom interface with possible wrapping of Repa/Accelerate arrays for fast computation.

Please ask me any questions on Gitter.