rdf4j
rdf4j copied to clipboard
Speed-up size() in LMDB Sail
Problem description
There are about 50M triples in our LMDB store. size() averagely takes over 60s.
The current implementation counts the statement iterator.
@Override
protected long sizeInternal(Resource... contexts) throws SailException {
try (Stream<? extends Statement> stream =
getStatementsInternal(null, null, null, false, contexts).stream()) {
return stream.count();
}
}
Preferred solution
MDB_stat exposes ms_entries, which is the exact number of data items in a database.
Could we leverage this value instead of iterating over every statement to compute the store size?
If I’m reading the docs and code correctly, ms_entries should equal the entry count of any of our indexes (e.g., spoc)?
typedef struct MDB_stat {
unsigned int ms_psize; /**< Size of a database page.
This is currently the same for all databases. */
unsigned int ms_depth; /**< Depth (height) of the B-tree */
mdb_size_t ms_branch_pages; /**< Number of internal (non-leaf) pages */
mdb_size_t ms_leaf_pages; /**< Number of leaf pages */
mdb_size_t ms_overflow_pages; /**< Number of overflow pages */
mdb_size_t ms_entries; /**< Number of data items */
} MDB_stat;
Are you interested in contributing a solution yourself?
Yes
Alternatives you've considered
No response
Anything else?
No response