vortex icon indicating copy to clipboard operation
vortex copied to clipboard

Epic: Extension DTypes, Scalars, and Arrays

Open gatesn opened this issue 8 months ago • 0 comments

We expect these to be far more common than they currently are. Keeping timestamps as extension dtypes is part of the forcing function to ensure better usability. We'll track the work here.

Prior to V0, we should agree on a heuristic for when something should be a native DType vs an ExtDType. My current idea is:

If built-in compute functions can be correctly computed over the storage type (with the result wrapped up again as an ExtDType), then it should be an ExtDType. If the type overrides the semantics of built-ins, requiring additional expressions for custom functions, then it should be a built-in DType.

By this hueristic:

  • BF16 is a DType (add/subtract cannot be pushed down over primitive u16 or f16 storage type)
  • Decimal is a DType (add/subtract cannot be pushed down over primitive u64 or u128 storage type)
  • DateTime is an ExtDType(u32/u64) (add/subtract can be pushed down over underlying u32/u64 storage)
  • UUID is an ExtDType([u8; N]) (comparisons / equality can be pushed down over underlying fixed length binary storage)
  • [x] #3064

gatesn avatar Apr 14 '25 08:04 gatesn