vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Add UUID ExtDTypes

Open joseph-isaacs opened this issue 8 months ago • 3 comments

Should we support a uuid type, I would assume we can split some of them into parts and compress? It might be one more thing to support

joseph-isaacs avatar Apr 10 '25 16:04 joseph-isaacs

I was thinking about this recently. I think we want an ArchetypeArray (or similar). Basically a big enum of types of things that are commonly stored badly. So uuids, IP addresses, using Y/N for booleans, etc.

The array detects these types, stores patches for things that fail to parse, and then pushes down into an optimal format (e.g. some fixed-width binary format).

gatesn avatar Apr 10 '25 16:04 gatesn

For UUID specifically, v4 has a hashed namespace (which is often presumably constant in a column), and then the rest is random bytes (incompressible, probably, ish)

V7 is better bc it embeds a timestamp! So can split into timestamp part and random bytes part

lwwmanning avatar Apr 10 '25 17:04 lwwmanning

UUIDv4 is all random (except for the type marker); UUIDv7 is part timestamp and part random, arranged so a left-justified lexicographic sort ends up sorting by time. :)

IMO UUID support is always good in databases, and ordinarily I would suggest treating them as opaque types, but I think UUIDv7 deserves a special case where if the user opts-in via some kind of schema annotation, the system can treat the time component of the UUIDv7 value as a timestamp for predicate pushdown purposes. ;)

acruise avatar Apr 10 '25 23:04 acruise