etcd icon indicating copy to clipboard operation
etcd copied to clipboard

RESTFul API return error when use etcd gRPC Proxy and gRPC Gateway

Open elricli opened this issue 2 years ago • 2 comments

What happened?

Watch request get error:

{
    "error": {
        "grpc_code": 2,
        "http_code": 500,
        "message": "context canceled",
        "http_status": "Internal Server Error"
    }
}

but put\range request is correct working.

What did you expect to happen?

Watch request correct working.

How can we reproduce it (as minimally and precisely as possible)?

  1. Start etcd.
    • Start Command: etcd
  2. Start etcd gRPC Proxy.
    • Start Command: etcd grpc-proxy start --debug --endpoints=http://localhost:2379
  3. Start etcd gRPC Gateway
    • Go code:
      func main() {
          ctx := context.Background()
          grpclog.SetLoggerV2(grpclog.NewLoggerV2(os.Stdout, os.Stderr, os.Stderr))
          opts := []grpc.DialOption{grpc.WithInsecure()}
          conn, err := grpc.DialContext(ctx, "127.0.0.1:23790", opts...)
          if err != nil {
              panic(err)
          }
          defer conn.Close()
          gwmux := runtime.NewServeMux()
      
          type registerHandlerFunc func(context.Context, *runtime.ServeMux, *grpc.ClientConn) error
          handlers := []registerHandlerFunc{
              etcdservergw.RegisterKVHandler,
              etcdservergw.RegisterWatchHandler,
          }
          for i := range handlers {
              if err = handlers[i](ctx, gwmux, conn); err != nil {
                   panic(err)
              }
          }
          log.Fatalln(http.ListenAndServe(":8080", gwmux))
      }
      

Anything else we need to know?

Using etcd gRPC Proxy coalesces the many wathers.

The etcd gRPC Proxy does not support the HTTP RESTFul API (see #13850), so need to add a gRPC Gateway to convert HTTP requests into gRPC requests.

error debuging screenshot: image

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.6.0-alpha.0 
Git SHA: fb5591050
Go Version: go1.17.6
Go OS/Arch: windows/amd64

3.5.2 Same

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

etcd:
$ etcdctl member list -w table
+------------------+---------+---------+-----------------------+-----------------------+------------+
|        ID        | STATUS  |  NAME   |      PEER ADDRS       |     CLIENT ADDRS      | IS LEARNER |
+------------------+---------+---------+-----------------------+-----------------------+------------+
| 8e9e05c52164694d | started | default | http://localhost:2380 | http://localhost:2379 |      false |
+------------------+---------+---------+-----------------------+-----------------------+------------+
$ etcdctl --endpoints=localhost:2379 endpoint status -w table
+----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT    |        ID        |    VERSION    | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
| localhost:2379 | 8e9e05c52164694d | 3.6.0-alpha.0 |   25 kB |      true |      false |         8 |         23 |                 23 |        |
+----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd gRPC Proxy:

$ etcdctl member list -w table
+----+---------+-------------------+------------+-----------------+------------+
| ID | STATUS  |       NAME        | PEER ADDRS |  CLIENT ADDRS   | IS LEARNER |
+----+---------+-------------------+------------+-----------------+------------+
|  0 | started | LAPTOP-E1J2021187 |            | 127.0.0.1:23790 |      false |
+----+---------+-------------------+------------+-----------------+------------+
$ etcdctl --endpoints=localhost:2379 endpoint status -w table
+-----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
|    ENDPOINT     |        ID        |    VERSION    | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+
| localhost:23790 | 8e9e05c52164694d | 3.6.0-alpha.0 |   25 kB |      true |      false |         8 |         23 |                 23 |        |
+-----------------+------------------+---------------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

elricli avatar Apr 08 '22 08:04 elricli

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 10 '22 12:07 stale[bot]

Keep

elricli avatar Jul 11 '22 02:07 elricli

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 15 '22 21:10 stale[bot]

remove stale

elricli avatar Oct 16 '22 02:10 elricli

do I get this deployment diagram right?

graph TD;
    Gateway-->|grpc|gRPCProxy;
    gRPCProxy-->|http|etcd;

The error you are seeing in your gateway code, that it can't connect to gRPC proxy?

tjungblu avatar Oct 17 '22 09:10 tjungblu

okay after debugging this, the implementation of the watch endpoint is a bit odd: https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/gw/rpc.pb.gw.go#L205-L228

On the grpc proxy the type couldn't be parsed correctly to either Create/Cancel: https://github.com/etcd-io/etcd/blob/main/server/proxy/grpcproxy/watch.go#L233

So on the proxy you get:

{"level":"error","ts":"2022-10-17T11:42:36.099+0200","caller":"grpcproxy/watch.go:275","msg":"not supported request type by gRPC proxy","request":"","stacktrace":"go.etcd.io/etcd/server/v3/proxy/grpcproxy.(*watchProxyStream).recvLoop\n\tetcd/server/proxy/grpcproxy/watch.go:275\ngo.etcd.io/etcd/server/v3/proxy/grpcproxy.(*watchProxy).Watch.func1\n\tetcd/server/proxy/grpcproxy/watch.go:128"}

Which makes sense, I don't really get how it would create a WatchRequest_CreateRequest in the gateway endpoint. Since this is autogenerated code, I'll checkout whether this is just a limitation of the grpc-gateway generator.

edit: I should RTFM: https://etcd.io/docs/v3.5/dev-guide/api_grpc_gateway/#watch-keys testing with the correct structure now :)

tjungblu avatar Oct 17 '22 09:10 tjungblu

do I get this deployment diagram right?

graph TD;
    gateway1[openrestry+lua] -->|http watch| grpcGateway[gRPC gateway];
    gateway2[openrestry+lua] --> |http watch| grpcGateway;
    grpcGateway-->|grpc watch|grpcProxy[etcd gRPC proxy];
    grpcProxy -->|grpc watch|etcd;

elricli avatar Oct 17 '22 09:10 elricli

Thanks @elricli, I think I can reproduce this with a simply curl:

curl -X POST http://localhost:8080/v3/watch -d '{"create_request": {"key":"Lw=="} }'

The EOF error comes from the fact that it tries to parse the watch request continuously in: https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/gw/rpc.pb.gw.go#L212-L215

Which fails after the first time as the decoder runs out of body to parse. If I move the decoder out of the loop it creates new watches continuously. I assume this was supposed to listen to the request body indefinitely and forward the request as they arrive and are decoded.

Also makes me also wonder why this is defined as a request stream: https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/rpc.proto#L72

The generated code didn't seem to have changed significantly over the years, so must be in the muxing or decoder.

tjungblu avatar Oct 17 '22 10:10 tjungblu

so I tried different languages and doing bidi streaming requests, I can only really make this work while removing the EOF breaking during the send loop:

diff --git a/api/etcdserverpb/gw/rpc.pb.gw.go b/api/etcdserverpb/gw/rpc.pb.gw.go
index 2fca126af..321e966ce 100644
--- a/api/etcdserverpb/gw/rpc.pb.gw.go
+++ b/api/etcdserverpb/gw/rpc.pb.gw.go
@@ -238,12 +240,13 @@ func request_Watch_Watch_0(ctx context.Context, marshaler runtime.Marshaler, cli
        go func() {
                for {
                        if err := handleSend(); err != nil {
-                               break
+                               //break
                        }
                }

Then it works correctly, even with curl. It will just busy loop decode the stream. Since this is indeed in the generated code, let me create a new issue in https://github.com/grpc-ecosystem/grpc-gateway

edit: created https://github.com/grpc-ecosystem/grpc-gateway/issues/2954

tjungblu avatar Oct 17 '22 14:10 tjungblu

Cleaning up all the threads today, this requires somewhat of a decision for etcd.

The grpc-gateway does not support bi-di streaming, nor is a correctly working version easy to add. The patch in https://github.com/grpc-ecosystem/grpc-gateway/pull/2972 works functionally, but it's a rather stupid idea to busy loop and will significantly hamper the scalability of the server.

Now I'm wondering what that means, should this REST gateway watch API be dropped/deprecated? I tried to find some references of this ever working, but didn't find one, there's no integration/e2e test to cover this either.

@serathius @ahrtr @ptabor @spzala wdyt?

tjungblu avatar Nov 15 '22 10:11 tjungblu

Thanks @elricli, I think I can reproduce this with a simply curl:

curl -X POST http://localhost:8080/v3/watch -d '{"create_request": {"key":"Lw=="} }'

The EOF error comes from the fact that it tries to parse the watch request continuously in: https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/gw/rpc.pb.gw.go#L212-L215

Which fails after the first time as the decoder runs out of body to parse. If I move the decoder out of the loop it creates new watches continuously. I assume this was supposed to listen to the request body indefinitely and forward the request as they arrive and are decoded.

Also makes me also wonder why this is defined as a request stream: https://github.com/etcd-io/etcd/blob/main/api/etcdserverpb/rpc.proto#L72

The generated code didn't seem to have changed significantly over the years, so must be in the muxing or decoder.

Hello tjungblu, thanks for your explaination of this issue. We have the same proble with the author of this issue. I want to know why the etcd server can provide the watch API by grpc gateway? Is there any differences in grpc gateway http handler between etcd server and grpc gateway service implemented by ourselves?

AlexAi27 avatar Nov 24 '22 03:11 AlexAi27

etcd already supports grpc gateway via the --enable-grpc-gateway flag, and it's enabled by default.

If you really want to support similar functionality in your own grpc gateway, I guess you need to get it wrapped in a websocket-proxy, please the example serve.go#L304-L319.

This isn't a priority to etcd, and I do not have extra time to dig into it. I don't think other maintainers have bandwidth to take care it either.

ahrtr avatar Nov 24 '22 05:11 ahrtr

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 18 '23 09:03 stale[bot]