BentoML
BentoML copied to clipboard
feat(EXPERIMENTAL): grpc
What does this PR address?
Adding experimental gRPC implementation for BentoServer. Creating a draft PR to better track progress.
Before submitting:
- [x] Does the Pull Request follow Conventional Commits specification naming? Here are GitHub's guide on how to create a pull request.
- [x] Does the code follow BentoML's code style, both
make format
andmake lint
script have passed (instructions)? - [x] Did you read through contribution guidelines and follow development guidelines?
- [ ] Did your changes require updates to the documentation? Have you updated those accordingly? Here are documentation guidelines and tips on writting docs.
- [ ] Did you write tests to cover your changes?
Who can help review?
Feel free to tag members/contributors who can help review your PR.
Codecov Report
Merging #2808 (e291039) into main (e291039) will not change coverage. The diff coverage is
n/a
.
:exclamation: Current head e291039 differs from pull request most recent head f0ae7b3. Consider uploading reports for the commit f0ae7b3 to get more accurate results
@@ Coverage Diff @@
## main #2808 +/- ##
=======================================
Coverage 69.00% 69.00%
=======================================
Files 122 122
Lines 10162 10162
=======================================
Hits 7012 7012
Misses 3150 3150
Hello @aarnphm, Thanks for updating this PR.
There are currently no PEP 8 issues detected in this PR. Cheers! :beers:
Comment last updated at 2022-09-13 06:38:30 UTC
Thanks for compiling this PR, @aarnphm.
On CLI, should we use the --grpc
option or serve-grpc
sub-command? Using --grpc
makes sense if most of the other options are also applicable to gRPC settings. If gRPC demands a mostly different set of options, serve-grpc
sub-command might make more sense.
On configuration, should we separate the configuration under api_server
with http
and grpc
? We will need to retroactively adapt existing configs to keys under http
. Alternatively, we can have a whole new section for grpc_api_server
. The reasoning to support one way or the other is similar to the reasoning behind CLI options.
api_server:
grpc:
enabled: true
...
http:
...
On Prometheus metrics port, 9090
is the Prometheus UI port btw. We may want to default to some other port to prevent collision if they run the Prometheus server on the host. The PrometheusInterceptor
is currently optional. Should it be optional?
On dependency, what is the difference between grpc
and grpcio
. Is there a minimum version we want to enforce?
cc @ssheng @sauyon @parano
On CLI, should we use the
--grpc
option orserve-grpc
sub-command? Using--grpc
makes sense if most of the other options are also applicable to gRPC settings. If gRPC demands a mostly different set of options,serve-grpc
sub-command might make more sense.
On the topic of having either --grpc
or serve-grpc
, This is our decision whether we want to optimize for user experience or maintainability:
-
--grpc
provides a pretty nice way for user to run theserve
commands on both locally and inside a container.- For docker case people can just do
docker run -it --rm -p 3000:3000 -p 9090:9090 iris_classifier:latest --grpc --enable-reflection
- Since
grpc
has a different sets of configuration values, if we decide to go with this, then our job is to document and implement amutual exclusive
click.Option
. I think documentation alone would not be enough since most people would skip it tbh. -
--grpc
currently brings a bit more complexity into ourserve_development
andserve_production
, but it is not too bad in terms of maintainability
- For docker case people can just do
-
serve-grpc
is in a way more explicit in terms of readability.- This would also makes our code a lot easier to maintain, as
serve-grpc
would just have a different set of functions and logics (which follows the separation of concern). - I would also like to do this as well, but the main crux of this problem is that the generation of Dockerfile would be harder. How do we determine when to use
serve
orserve-grpc
? - Obv we can read it via configuration, but then the workaround for running docker container would be pretty ugly, and consider the state of containerization right now with less to no tests setup (I will set this up after this PR), prefer the first option.
- This would also makes our code a lot easier to maintain, as
Note that for both cases, we will also provide either a --http
or serve-http
for symmetry sake. serve
should still be a the public entrypoint, whereas serve-http
is an alias to serve
.
On Prometheus metrics port,
9090
is the Prometheus UI port btw.
I aware of this, I think we should just use something like 50050.
The same case for host and ports for gRPC. I saw a lot of system that does the similar things, where they separate HTTP and gRPC ports to two differents fields. This would mean api_server.[host|port]
would also become api_server.http.[host|port]
and the same applies to gRPC.
This is nice, because we can support the case where if they want both gRPC and HTTP, no port collision would happen. My proposal is to have something such as:
api_server:
http:
port: 3000
host: 0.0.0.0
grpc:
port: 50051
host: 0.0.0.0
The
PrometheusInterceptor
is currently optional. Should it be optional?
This should be optional and follow the metrics
options from configuration.
On dependency, what is the difference between
grpc
andgrpcio
. Is there a minimum version we want to enforce?
Oh, grpc
are the actual python imports, whereas grpcio
is the package name 😄 Minimum version is 1.41 since that is the first one that provides support for 3.10
Quick autopsy of current config api_server
:
api_server:
# metrics and logging are currently being used by both HTTP and gRPC
metrics:
enabled: true
namespace: BENTOML
logging:
access:
enabled: true
request_content_length: true
request_content_type: true
response_content_length: true
response_content_type: true
# currently port and host are also being used by both HTTP and gRPC (subject to separation)
port: 3000
host: 0.0.0.0
workers: 1 # Currently this workers are only used for HTTP. Ideally we can also use this the migration thread pools we have for running gRPC server.
timeout: 60 # HTTP specific, not currently be used inside gRPC. One thing that we can use this for gRPC is to use this as timeout for Health check probe.
http: # http-specific
max_request_size: 20971520
backlog: 2048
cors:
enabled: false
access_control_allow_origin: ~
access_control_allow_credentials: ~
access_control_allow_methods: ~
access_control_allow_headers: ~
access_control_max_age: ~
access_control_expose_headers: ~
grpc: # grpc-specific
enabled: false
max_concurrent_streams: ~ # https://github.com/grpc/grpc/blob/ebfb028a29ef5f0f4be6f33c131c740ae0a32c84/src/core/ext/transport/chttp2/transport/http2_settings.cc#L51
maximum_concurrent_rpcs: ~
max_message_length: -1
reflection:
enabled: false
metrics:
port: 9090 # will change to 50050
host: 0.0.0.0
I'm in favor of having separate a separate gRPC port configuration value.
I think --grpc
is probably fine here; most of the options are shared, right?
I'm in favor of having separate a separate gRPC port configuration value.
I think
--grpc
is probably fine here; most of the options are shared, right?
well yes, except a few options from grpc are grpc only