BentoML
BentoML copied to clipboard
feat: Support gRPC in BentoML API Server
Support using gRPC instead of HTTP API for sending prediction requests. When an API model server is deployed as a backend service, many teams prefer using gRPC over HTTP. See related discussion here
One blocker for this is the Asyncio support on gRPC end, it is currently under development https://github.com/grpc/grpc/projects/16
Also for users waiting for this feature, please notice that using gRPC does not necessarily bring better performance - Protobuf is faster when only comparing the data serialization time. But when building a model server, we need to take into consideration the computation required to turn the serialized Protobuf objects into a format that can be used by users' models. In most ML frameworks, a trained model is expecting pandas.DataFrame, np.array, tf.Tensor, or PIL.image:
JSON Request => pandas.DataFrame => Model
Protobuf Msg => Protobuf Object => pandas.DataFrame => Model
And it is this extra step of converting Protobuf message in-memory object into pandas.DataFrame, that is making it less efficient than the JSON/HTTP approach. Although gRPC does bring some edge over REST API due to the use of http/2 and compression.
Thus adding gRPC support is not for better performance, but mostly making it more convenient for teams that are already using gRPC for their backend services.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hey, are we implementing this?
Hi @harshitsinghai77, I believe this is P1, and will be addressed after BentoML 1.0
Got it. Let me know if you need me to work on it.
Working in progress, per current MLH batch. Current progress can be tracked here at https://github.com/bentoml/BentoML/tree/grpc branch and this PR https://github.com/bentoml/BentoML/pull/2808
Hi all, I'm happy to say that gRPC support has been merged into main
via PR #2808. We will include this in the next release as an experimental feature.
gRPC streaming is the way to go next
This is tracked on an adjacent ticket #4170