Send dataframe API
Describe the solution you'd like I'd like to send dataframes (e.g. pandas and/or arrow) at once. They have the same timeline but multiple columns (e.g. time, x, y, z), whereas most often the index is the time either in us, seconds or pd.TimedeltaIndex. Great would be something like:
send_dataframe( base_entity_path = 'mydataframe',
timeline = 'mytimeline',
data = df,
time_column:Union[None,str]= 'index', # None would always select the index
columns:Union[None, List[str]] = ['x','y'] # None would select all columns
)
Describe alternatives you've considered Sending each column in separate calls. This works but might generate more overhead then necessary.
If I understand correctly, your proposed API would result in the following data being logged:
- entity
mydataframe/xwithindextimestamps and a component withdf["x"]as content, - entity
mydataframe/ywithindextimestamps and a component withdf["y"]as content,
both on the mytimeline timeline.
Is that correct?
In general, having a dataframe-based API is very good fit for our new columnar stuff. I see at least two points here:
- If the
send_dataframeAPI ends up logging to multiple "sub-entities" (as I think you suggest here), there would be little performance gain w.r.t separatesend_columnscalls. Chunks (our new fundamental data structure) always apply to a single entity, so multiple chunks would need to be emitted here in any case. (This is not to say that a convenience API wouldn't be useful.) - If the
send_dataframeAPI logs column to a single entity, but different components, then we'd need to figure out a mapping from Python-side columndtype/label to component type (with the restriction that each components of a single entity must have a unique type). In particular, your example seems ambiguous as to what component type should be used.
Creating subentities seems to be rhe easiest way.
I can't see how the second option would work, I don't know enough about the inner workings of rerun.
But maybe there is a third if there was a datatframe entity type? Or is that against the design principles?
Related: https://github.com/rerun-io/rerun/issues/8619