rdf4j
rdf4j copied to clipboard
LMDB: Implement extensible ID scheme
Problem description
The LmdbStore uses 64 bit IDs for values. The scheme is fixed and uses the lower two bits to encode the type of the referenced value:
- 00 => URI
- 01 => Literal
- 10 => BNode
- 11 => Namespace string (internal use only)
To support RDF-star #3723 and embedded values #4774 a new scheme that is also extensible for future requirements should be developed.
Preferred solution
The following basic scheme could be used:
- bit 0..7 => 8 bits for type
- bit 8..63 => 56 bits for value
Inspired by Jena the following detailled encoding can be used:
-
bit 0..7:
- 0 => arbitrary pointer
- 1 => URI
- 2 => Literal
- 3 => BNode
- 4 => Triple
- ... more not inlined values
// following inlined values
- 16 => integer
- 17 => decimal
- ...
see also https://github.com/apache/jena/blob/02ecb71c7033dc09cd929474c9884045dfaa9dc1/jena-tdb2/src/main/java/org/apache/jena/tdb2/store/NodeIdType.java#L87
Are you interested in contributing a solution yourself?
Yes
Alternatives you've considered
No response
Anything else?
No response