rerun Tracking issue: Migrate from `re

Blockers

Soft-blocked on https://github.com/apache/arrow-rs/issues/6360 (for lowering memory use)
Semi-blocked on https://github.com/apache/arrow-rs/issues/4472 (we use DataType::Extension for Tuid)

Multiple end-goals:

Use same arrow lib as the rest of the ecosystem, which is where all the bug & perf fixes are actually happening
Use inifinitely less space to store Arrow metadata (schema deduplication)
- #1809
Make it possible to send raw Arrow data to Rerun and have it just work (RERUN:component_name)
- #3360
- Also frees up usage of Arrow extensions for actual native extensions (e.g. #3004)
Native integration with half for f16
Etc etc

TODO (split into sub-issues as needed):

[ ] Remove all direct uses of arrow2 (codegen, data{cell,row,table}, ArrowBuffer, etc)
- Related: #2978
[x] Migrate serde-based components (i.e. blueprint stuff) to arrow1
- https://docs.rs/arrow-json/47.0.0/arrow_json/reader/struct.Decoder.html#method.serialize might be all we need
[ ] Get rid of Arrow extensions everywhere, introduce RERUN:component_name (#3360)
[ ] Runtime schema registry / dedupe datatypes (#1809)
[ ] Remove DataCell::component_name
[ ] Replace TransportChunk with RecordBatch?

On the way there we might hit a few bumps because we have a lot of redundant ad-hoc code that integrates with polars (which is built on top of arrow2).

The solution to this is to make sure we only integrate with polars in one single place: the Data{Cell,Row,Table} layer (https://github.com/rerun-io/rerun/issues/1692). Once that's done, we can remove all ad-hoc polars code everywhere and just build a Data{Row,Cell,Table} anytime we want a polars::Series/polars::DataFrame (#1759).

Internally, the conversion from DataTable to polars::DataFrame will require a zero-copy tri-stage conversion from arrow1->arrow2->polars.

Supersedes https://github.com/rerun-io/rerun/issues/1805
Supersedes #2354

Oct 09 '23 10:10 teh-cmc

re_arrow2 has an arrow feature, with glue for converting data between arrow and re_arrow2: https://docs.rs/re_arrow2/0.17.4/re_arrow2/array/trait.Arrow2Arrow.html

Using that we can start this migration piece-wise. It would have double the dependencies for a transitionary period, leading to longer compilation times and bigger .wasm binary, but I think that is an ok tradeoff.

Potential roadmap:

[x] Verify that Arrow2Arrow is zero-copy
- https://github.com/rerun-io/re_arrow2/pull/6
https://github.com/rerun-io/rerun/issues/6819
[ ] Move SizeBytes to own crate, with separate arrow and arrow2 feature flags
[x] Rename to_arrow/from_arrow/… to to_arrow2/from_arrow2/…
[x] Add poly-filled to_arrow/from_arrow using the glue
[ ] Migrate codegenned serialization

After de-chunkfification:

[ ] Migrate codegenned deserialization
[ ] Migrate everything else

As of 2024-07-08, there are only around 300 lines of Rust referencing the string arrow2 directly, when one ignores generated code.

ignored paths

crates/re_types/**, crates/re_types_core/src/archetypes/**, crates/re_types_core/src/datatypes/**, crates/re_types_core/src/components/**, crates/re_types_blueprint/src/blueprint/components/**, crates/re_types_blueprint/src/blueprint/archetypes/**

Jul 08 '24 12:07 emilk

I believe https://github.com/rerun-io/rerun/issues/6807 also requires bringing in a dependency on arrow

Jul 10 '24 17:07 jleibs

Blocked on:

https://github.com/apache/arrow-rs/pull/6300

Aug 31 '24 18:08 teh-cmc

New blocker:

https://github.com/apache/arrow-rs/issues/6360

Sep 05 '24 13:09 teh-cmc

This is almost done now. What remains is:

Porting re_types_builder (the generated code is all arrow-rs, but the builder itself uses arrow2::Datatype)
IPC serialization (we're hitting a bug in arrow-rs). Look for SERIALIZE_WITH_ARROW_1 in the code for details
FFI communication in rerun_c

Jan 21 '25 08:01 emilk

I have added https://github.com/apache/arrow-rs/issues/7315 which I think is the root cause of the serialization error.

Mar 20 '25 18:03 timsaucer

What's left now is mainly porting re_types_builder, which should be pretty straight-forward

Mar 27 '25 14:03 emilk

Tracking issue: Migrate from `re_arrow2` to `arrow`