greptimedb
greptimedb copied to clipboard
Integrate Arrow Flight into our grpc service
Currently the grpc service between Datanode and Frontend is a little hard to use, and not efficient. The recordbatches in Datanode must go through a layer of conversion that mapping vector's data and schema into our homemade grpc protocol and vice-versa. Also encordings and decordings along the way are cumbersome.
The Apache Arrow Flight seems very promising to resolve our problem. In its introduction, the "motivation" states that:
Our design goal for Flight is to create a new protocol for data services that uses the Arrow columnar format as both the over-the-wire data representation as well as the public API presented to developers. In doing so, we reduce or remove the serialization costs associated with data transport and increase the overall efficiency of distributed data systems. Additionally, two systems that are already using Apache Arrow for other purposes can communicate data to each other with extreme efficiency.
seems just fit in our grpc service between Datanode and Frontend. So I think Flight is very much worth to further look into.
refs: rust binding: https://crates.io/crates/arrow-flight flight rpc proto: https://arrow.apache.org/docs/format/Flight.html influx-iox impl: https://github.com/influxdata/influxdb_iox/tree/main/service_grpc_flight dremio's practice: https://www.dremio.com/subsurface/an-introduction-to-apache-arrow-flight-sql/
Waiting for Arrow upgrade #555