BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

feat(EXPERIMENTAL): grpc

Open aarnphm opened this issue 2 years ago • 2 comments

What does this PR address?

Adding experimental gRPC implementation for BentoServer. Creating a draft PR to better track progress.

Before submitting:

Who can help review?

Feel free to tag members/contributors who can help review your PR.

aarnphm avatar Jul 26 '22 00:07 aarnphm

Codecov Report

Merging #2808 (e291039) into main (e291039) will not change coverage. The diff coverage is n/a.

:exclamation: Current head e291039 differs from pull request most recent head f0ae7b3. Consider uploading reports for the commit f0ae7b3 to get more accurate results

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #2808   +/-   ##
=======================================
  Coverage   69.00%   69.00%           
=======================================
  Files         122      122           
  Lines       10162    10162           
=======================================
  Hits         7012     7012           
  Misses       3150     3150           

codecov[bot] avatar Jul 26 '22 00:07 codecov[bot]

Hello @aarnphm, Thanks for updating this PR.

There are currently no PEP 8 issues detected in this PR. Cheers! :beers:

Comment last updated at 2022-09-13 06:38:30 UTC

pep8speaks avatar Jul 27 '22 20:07 pep8speaks

Thanks for compiling this PR, @aarnphm.

On CLI, should we use the --grpc option or serve-grpc sub-command? Using --grpc makes sense if most of the other options are also applicable to gRPC settings. If gRPC demands a mostly different set of options, serve-grpc sub-command might make more sense.

On configuration, should we separate the configuration under api_server with http and grpc? We will need to retroactively adapt existing configs to keys under http. Alternatively, we can have a whole new section for grpc_api_server. The reasoning to support one way or the other is similar to the reasoning behind CLI options.

api_server:
  grpc:
    enabled: true
    ...
  http:
    ...

On Prometheus metrics port, 9090 is the Prometheus UI port btw. We may want to default to some other port to prevent collision if they run the Prometheus server on the host. The PrometheusInterceptor is currently optional. Should it be optional?

On dependency, what is the difference between grpc and grpcio. Is there a minimum version we want to enforce?

ssheng avatar Aug 22 '22 01:08 ssheng

cc @ssheng @sauyon @parano

On CLI, should we use the --grpc option or serve-grpc sub-command? Using --grpc makes sense if most of the other options are also applicable to gRPC settings. If gRPC demands a mostly different set of options, serve-grpc sub-command might make more sense.

On the topic of having either --grpc or serve-grpc, This is our decision whether we want to optimize for user experience or maintainability:

  • --grpc provides a pretty nice way for user to run the serve commands on both locally and inside a container.

    • For docker case people can just do docker run -it --rm -p 3000:3000 -p 9090:9090 iris_classifier:latest --grpc --enable-reflection
    • Since grpc has a different sets of configuration values, if we decide to go with this, then our job is to document and implement a mutual exclusive click.Option. I think documentation alone would not be enough since most people would skip it tbh.
    • --grpc currently brings a bit more complexity into our serve_development and serve_production, but it is not too bad in terms of maintainability
  • serve-grpc is in a way more explicit in terms of readability.

    • This would also makes our code a lot easier to maintain, as serve-grpc would just have a different set of functions and logics (which follows the separation of concern).
    • I would also like to do this as well, but the main crux of this problem is that the generation of Dockerfile would be harder. How do we determine when to use serve or serve-grpc?
    • Obv we can read it via configuration, but then the workaround for running docker container would be pretty ugly, and consider the state of containerization right now with less to no tests setup (I will set this up after this PR), prefer the first option.

Note that for both cases, we will also provide either a --http or serve-http for symmetry sake. serve should still be a the public entrypoint, whereas serve-http is an alias to serve.

On Prometheus metrics port, 9090 is the Prometheus UI port btw.

I aware of this, I think we should just use something like 50050.

The same case for host and ports for gRPC. I saw a lot of system that does the similar things, where they separate HTTP and gRPC ports to two differents fields. This would mean api_server.[host|port] would also become api_server.http.[host|port] and the same applies to gRPC.

This is nice, because we can support the case where if they want both gRPC and HTTP, no port collision would happen. My proposal is to have something such as:

api_server:
  http:
    port: 3000
    host: 0.0.0.0
  grpc:
    port: 50051
    host: 0.0.0.0

The PrometheusInterceptor is currently optional. Should it be optional?

This should be optional and follow the metrics options from configuration.

On dependency, what is the difference between grpc and grpcio. Is there a minimum version we want to enforce?

Oh, grpc are the actual python imports, whereas grpcio is the package name 😄 Minimum version is 1.41 since that is the first one that provides support for 3.10


Quick autopsy of current config api_server:

api_server:
  # metrics and logging are currently being used by both HTTP and gRPC
  metrics:
    enabled: true
    namespace: BENTOML
  logging:
    access:
      enabled: true
      request_content_length: true
      request_content_type: true
      response_content_length: true
      response_content_type: true
  
  # currently port and host are also being used by both HTTP and gRPC (subject to separation)
  port: 3000
  host: 0.0.0.0

  workers: 1  # Currently this workers are only used for HTTP. Ideally we can also use this the migration thread pools we have for running gRPC server.
  timeout: 60  # HTTP specific, not currently be used inside gRPC. One thing that we can use this for gRPC is to use this as timeout for Health check probe.
  
   http:  # http-specific
    max_request_size: 20971520
    backlog: 2048
    cors:
      enabled: false
      access_control_allow_origin: ~
      access_control_allow_credentials: ~
      access_control_allow_methods: ~
      access_control_allow_headers: ~
      access_control_max_age: ~
      access_control_expose_headers: ~
  
  grpc:  # grpc-specific
    enabled: false
    max_concurrent_streams: ~   # https://github.com/grpc/grpc/blob/ebfb028a29ef5f0f4be6f33c131c740ae0a32c84/src/core/ext/transport/chttp2/transport/http2_settings.cc#L51
    maximum_concurrent_rpcs: ~
    max_message_length: -1
    reflection:
      enabled: false
    metrics:
      port: 9090  # will change to 50050
      host: 0.0.0.0

aarnphm avatar Aug 23 '22 02:08 aarnphm

I'm in favor of having separate a separate gRPC port configuration value.

I think --grpc is probably fine here; most of the options are shared, right?

sauyon avatar Aug 23 '22 05:08 sauyon

I'm in favor of having separate a separate gRPC port configuration value.

I think --grpc is probably fine here; most of the options are shared, right?

well yes, except a few options from grpc are grpc only

aarnphm avatar Aug 23 '22 16:08 aarnphm