rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

LMDB: Implement extensible ID scheme

Open kenwenzel opened this issue 2 months ago • 0 comments

Problem description

The LmdbStore uses 64 bit IDs for values. The scheme is fixed and uses the lower two bits to encode the type of the referenced value:

  • 00 => URI
  • 01 => Literal
  • 10 => BNode
  • 11 => Namespace string (internal use only)

To support RDF-star #3723 and embedded values #4774 a new scheme that is also extensible for future requirements should be developed.

Preferred solution

The following basic scheme could be used:

  • bit 0..7 => 8 bits for type
  • bit 8..63 => 56 bits for value

Inspired by Jena the following detailled encoding can be used:

  • bit 0..7:

    • 0 => arbitrary pointer
    • 1 => URI
    • 2 => Literal
    • 3 => BNode
    • 4 => Triple
    • ... more not inlined values

    // following inlined values

    • 16 => integer
    • 17 => decimal
    • ...

see also https://github.com/apache/jena/blob/02ecb71c7033dc09cd929474c9884045dfaa9dc1/jena-tdb2/src/main/java/org/apache/jena/tdb2/store/NodeIdType.java#L87

Are you interested in contributing a solution yourself?

Yes

Alternatives you've considered

No response

Anything else?

No response

kenwenzel avatar Apr 12 '24 07:04 kenwenzel