rerun icon indicating copy to clipboard operation
rerun copied to clipboard

`send_columns` has false-positive when promoting batch to column for list-types

Open jleibs opened this issue 1 year ago • 1 comments

Out check for whether a batch is a column is currently just whether it is a list array: https://github.com/rerun-io/rerun/blob/5a2d5dd8089038a33894a32dc6d801d8d509d0e6/rerun_py/src/arrow.rs#L167-L169

However, for types that are already list arrays, such as ImageBuffer, this leads us to pass through the batch without wrapping it suitably, leading to downstream errors.

This came up in the context of a proof-of-concept for logging image batches:

import numpy as np
import pyarrow as pa
import rerun as rr

rr.init("rerun_example_send_columns", spawn=True)

COUNT = 64
WIDTH = 100
HEIGHT = 50
CHANNELS = 3

# Create our time
times = np.arange(0, COUNT)

# Create a batch of images
rng = np.random.default_rng(12345)
image_batch = rng.uniform(0, 255, size=[COUNT, HEIGHT, WIDTH, CHANNELS]).astype(dtype=np.uint8)

# Log the ImageFormat once, as static
format_static = rr.components.ImageFormat(width=WIDTH, height=HEIGHT, color_model="RGB", channel_datatype="U8")
rr.log("image", [format_static], static=True)

# Manually create an ImageBuffersBatch
image_buffers = (row.tobytes() for row in image_batch.reshape(COUNT, -1))
raw_arrow = pa.array(image_buffers, type=rr.components.ImageBufferType())
buffers_batch = rr.components.ImageBufferBatch(raw_arrow)

# Uncomment this to work around the problem
# buffers_column = buffers_column.partition([1] * COUNT)


rr.send_columns(
    "image",
    times=[rr.TimeSequenceColumn("step", times)],
    components=[rr.Image.indicator(), buffers_column],
)

Which ends up with the fairly cryptic error:

/home/jleibs/rerun/docs/snippets/all/tutorials/send_columns.py:37: RerunWarning: send_columns: RuntimeError(Detected malformed Chunk: The outer array in chunked component batch must be a sparse list, got List(Field { name: "item", data_type: UInt8, is_nullable: false, metadata: {} }))
  rr.send_columns(

jleibs avatar Aug 09 '24 18:08 jleibs

Even though it's a bit less performant, this could be a good argument for moving column-promotion from rust back to python, where we still have the full object context.

jleibs avatar Aug 09 '24 18:08 jleibs

noticed this issue only now. Moving column promotion to Python is essentially what I did in

  • https://github.com/rerun-io/rerun/pull/7155

Wumpf avatar Aug 13 '24 11:08 Wumpf