Write down a plan for scalar datatype generics
We are now not too far from being able to support generic bit-sizes (e.g. a 64bit Point3D or a 16bit Scalar) for all things backed by primitive types or arrays of primitive types, for two reasons:
- The first reason is that
Chunks allow it, and therefore theChunkStorealso (mostly) allows it. - The second reason is that space views now mostly work with primitive-casted data, e.g.:
let data = re_query2::range_zip_1x5(
all_positions_indexed,
all_colors.primitive::<u32>(),
all_radii.primitive::<f32>(),
all_labels.string(),
all_class_ids.primitive::<u16>(),
all_keypoint_ids.primitive::<u16>(),
)
I.e. one can easily imagine casting to different native types based on the current arrow datatype (likely tedious, but also probably straightforward in most cases).
Note that I'm only talking about bit-size transformations here, nothing beyond that, and definitely nothing that involves changes in semantics (so e.g. Color is likely out).
Still, even that kind of limited change will likely have ripple effects across the entire stack, and those need to be planned for appropriately:
- What's the impact on our data model (interactions with all things tensor-like in particular, probably also all things transforms).
- What's the impact on logging APIs: C++, Python & Rust.
- What's the impact on our query APIs.
- What's the impact on the different space views implementation (e.g. I imagine a 64bit vertex would not make the renderer happy, so some transformation is required, maybe?).
- What other impact? E.g. wrt blueprint defaults/overrides?
Looking further into the future, what about generic bit-sizes for timeline values?
e.g. a 64bit
Point3D
I'm confused. You mean 3x f64 or the whole Point3D being 64 bit?
e.g. a 64bit
Point3DI'm confused. You mean 3x f64 or the whole
Point3Dbeing 64 bit?
3x64
For our logging SDK:s I think there are two ways to approach this:
A) runtime generics B) compile-time generics
Tensor example
Let's use a hypothetical tensor as an example:
struct TensorData {
shape: [uint64],
/// datatype generics - supports any integer or float (for instance)
buffer: [Scalar],
}
Runtime generics
If we go with runtime generics this may be translated to something like:
pub struct TensorData {
pub shape: arrow::buffer::ScalarBuffer<u64>,
pub buffer: arrow::array::ArrayRef,
}
If we go with this approach we will have to change the Loggable trait, which currently look like this:
pub trait Loggable: … {
fn arrow_datatype() -> arrow::datatypes::DataType;
…
}
to
pub trait Loggable: … {
fn arrow_datatype(&self) -> arrow::datatypes::DataType;
…
}
The to_arrow_opt function would also need to check that all instances it is given have the same datatype.
In some places we also instantiate an empty arrow array of a component. With runtime generics, we would need to select one "canonical" datatype that we use in these cases (e.g. f64).
Presumably all TensorData variants (no matter the datatype) will share the same ComponentDescriptor.
Compile-time generics
With this approach, any use of datatype generics in our .fbs files would result in templates in our generated code:
pub struct TensorData<T> {
pub shape: arrow::buffer::ScalarBuffer<u64>,
pub buffer: arrow::buffer::ScalarBuffer<T>,
}
This approach has the advantage that it better fits our current code base. We can still keep the same Loggable trait interface, for instance.
However, it will quickly become unwieldy to support many datatype generics in the same struct (e.g. supporting something else than u64 for the shape above).
I did some experimentation with adding a rerun.builtins.Scalar type: https://github.com/rerun-io/rerun/compare/481ebe434f967e7ac07222e8ca1504215e2c1588...emilk/datatype-generics-experiment