rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Speed-up size() in LMDB Sail

Open odysa opened this issue 4 months ago • 6 comments

Problem description

There are about 50M triples in our LMDB store. size() averagely takes over 60s. The current implementation counts the statement iterator.

@Override
protected long sizeInternal(Resource... contexts) throws SailException {
    try (Stream<? extends Statement> stream =
             getStatementsInternal(null, null, null, false, contexts).stream()) {
        return stream.count();
    }
}

Preferred solution

MDB_stat exposes ms_entries, which is the exact number of data items in a database. Could we leverage this value instead of iterating over every statement to compute the store size?

If I’m reading the docs and code correctly, ms_entries should equal the entry count of any of our indexes (e.g., spoc)?

typedef struct MDB_stat {
	unsigned int	ms_psize;			/**< Size of a database page.
											This is currently the same for all databases. */
	unsigned int	ms_depth;			/**< Depth (height) of the B-tree */
	mdb_size_t		ms_branch_pages;	/**< Number of internal (non-leaf) pages */
	mdb_size_t		ms_leaf_pages;		/**< Number of leaf pages */
	mdb_size_t		ms_overflow_pages;	/**< Number of overflow pages */
	mdb_size_t		ms_entries;			/**< Number of data items */
} MDB_stat;

Are you interested in contributing a solution yourself?

Yes

Alternatives you've considered

No response

Anything else?

No response

odysa avatar May 31 '25 04:05 odysa