neosemantics icon indicating copy to clipboard operation
neosemantics copied to clipboard

The case for strong UUID support (Version 5 :: Named UUIDs)

Open acmeguy opened this issue 3 years ago • 0 comments

Hi all,

I'd like to make the case for native support of UUIDs and correspðonding hash indexes, especially if they are stored and indexed in their native 128bit format.

This is especially releveant when dealing with RDF.

Everyone that deals with the semantic web and RDF data knows that URIs play an important role as universal and cross-site entitiy ID's. When storing them it is not uncommon that these values are indexed for quick, per-RDFID, retrieval.

Fewer seem to know that these long strings (ie. http://dbpedia.org/resource/Reykjavík) can be predictibly and reliably converted into UUID using UUID Version 5, sometimes called Named UUIDs).

'http://dbpedia.org/resource/Reykjavík' is ALWAYS converted to 'e80aa274-794a-507d-bb70-10408bea8ed4'

Regarding UUID V5:

  • They only take 128 bits in storage (vs 38 bytes for the example above and double that for unicode support)
  • They are a perfect fit for a native hash indexes that are a lot smaller than any tree+string besed indexes (Commonly used for RDF URLs)
  • All systems will convert this string to the same UUID if the URL namespace is used (Deterministic)

These UUID values would never replace the DRF URL as they can never be converted back. This is a simple addition that speeds up retrieval of any Node that is identified by its RDF ID.

This is a online converted: https://www.webtools.services/uuid-generator

  • rember to use V5 and the URL namespace when converting strings to UUIDs.

acmeguy avatar Jul 22 '21 11:07 acmeguy