BentoML
BentoML copied to clipboard
feature: Arrow table input/output
Feature request
I think that it would be great to add Arrow Tables as an IO type for BentoML endpoints. This would be particularly beneficial for the GRPC server where the Arrow IPC format (not Parquet) could be used directly by dumping the data in the serialized_bytes field of the Protobuf message.
Motivation
Parquet is currently used to move Pandas DataFrames around in BentoML and is a great storage format but it doesn't maintain all of the great properties of the in-memory Arrow format (because it is designed as an on-disk format) like strict register alignment. It maaay reduce on-the-wire data size but will almost certain increase serialization/deserialization time.
I believe that this addition would:
- reduce serialization/deserialization latency
- allow for the easy use of other tools within the Arrow ecosystem (Polars, Datafusion, DuckDB, etc etc.)
Other
No response
Hi @judahrand - we are working on a new iteration of IO Descriptor in BentoML and it will come with Arrow support! cc @frostming
Does the code that's in development exist somewhere? I'd be interested in having a read.
Does the code that's in development exist somewhere? I'd be interested in having a read.
Sure, #4240
@parano Did Arrow support ever get added?