mmpdb
mmpdb copied to clipboard
[WIP] Fragdb proposal
This is a work-in-progress to replace the JSON-Lines fragment file with SQLite-based file.
For full details see https://github.com/rdkit/mmpdb/issues/37 .
- The fragment filename (if not given) defaults to "input.fragdb"
- This applies to
mmpdb fragment
andmmpdb index
- This applies to
- The generated SQLite file is about the same size as the uncompressed fragments file (though the fragments file is very compressible.)
- Fragment output time doesn't appear affected (profiling shows at most 1% in sqlite)
- I haven't tested indexing performance
- The "fragments" format is still available, for cross-comparison purposes.
- That should be removed before the pull request is accepted.
- It removes the (undocumented) fraginfo format.
The current code needs another cleanup pass.
I will first investigate if using SQLAlchemy simplifies the tedious manual ORM of this work-in-progress.
Just switched the implementation over to use dataclasses instead of the manual class definitions using __slots__
, __init__
and __repr__
.
After tuning, the overall performance is the same as the hand-written code.
I also cleaned up the SQL code to make better use of the dataclass information, which helps reduce the amount of typing to convert SQL column names to local variable names to class instance names.