rerun Split Tensor component into several archetypes

Split out of https://github.com/rerun-io/rerun/issues/6388

Related to:

https://github.com/rerun-io/rerun/issues/2341

We generate archetypes and components for all tensor variants (TensorF32, TensorU8, etc) and make sure they share the same Visualizer:

archetype TensorU8 {
    buffer: BufferU8,
    
    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferU8 {
    data: [u8],
}

archetype TensorF32 {
    buffer: BufferF32,

    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferF32 {
    data: [f32],
}

mechanics of same-visualizer are a bit unclear. Have visualizer just listen to several indicators / archetypes? Breaks 1:1 relationship that we were striving for. Can revisit later?
this will break some "use this tensor like an image" cases that we allow today. Mitigate only as far as meaningful

Impact on Mesh's texture: Log an Image archetype at the same spot instead.

Detailed rationale (via @jleibs on https://github.com/rerun-io/rerun/issues/6388#issuecomment-2134003885):

Most of the choices for working with tensors fall into one of 4 categories.

Typed buffer, multiple data-types (the proposal)

Pros:

When processing a chunk the raw arrow data is much easier to work with
Opportunity to align with the official arrow spec for tensor representation
Aligns with our long-term direction of wanting to have multiple types and datatype conversions

Cons:

Multi-datatype representation means we must either proliferate typed components or introduce datatype conversions.

The current hypothesis is that proliferating types is a known challenge and can be mostly automated with a mixture of code-gen and some helper code, whereas datatype conversions is an unknown challenge.

Still this puts us on a pathway where once we support multi-typed components, we mostly delete a bunch of code and everything gets simpler. Any type conversions move from visualizer-space to data-query-space, but the types and arrow representations we work with don't actually need to change.

Untyped buffer with type-id

Pros

Avoids arrow unions while maintaining a single datatype.

Cons

Forces arrow users to do annoying user-space datatype casting.
Doesn't align with our long-term goals

Typed buffer with union

Pros

Status quo. Already works.

Cons

Forces arrow users to do annoying poorly supported union operations when loading or reading tensors.

Jul 09 '24 13:07 Wumpf

An alternative is to only have many Buffer components (BufferU8, BufferU16, …), but only one Tensor archetype:

archetype Tensor {
    shape: TensorShape,
    dimension_names: Option<DimensionNames>,
    
    // Set exactly one of these:
    buffer_u8: Option<BufferU8>,
    buffer_u16: Option<BufferU16>,
    buffer_u32: Option<BufferU32>,
    …

    color_model: Option<ColorModel>, // to interpret this tensor as an image
}

I believe this will lead to a lot less duplicated code

Jul 15 '24 12:07 emilk

Most of this is done, the rest is covered by

https://github.com/rerun-io/rerun/issues/9119

Feb 24 '25 16:02 emilk