opentelemetry-go-instrumentation icon indicating copy to clipboard operation
opentelemetry-go-instrumentation copied to clipboard

grpc call cause process panic sometime

Open quanbisen opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug

I using docker deploy, my process will panic sometime after deploy opentelemetry-go-instrumentation. And my process using docker restart always policy and no trace data collected after panic.

  • opentelemetry-go-instrumentation output log:
{"time":"2024-09-26T10:21:57.679678538+08:00","level":"INFO","source":{"function":"main.main","file":"/app/cli/main.go","line":103},"msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false,"version":{"Release":"v0.14.0-alpha","Revision":"be50b73b9826999bb31de1ff02d3bc9e674f8855","Go":{"Version":"go1.23.1","OS":"linux","Arch":"amd64"}}}
{"time":"2024-09-26T10:21:59.682574195+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/process.(*Analyzer).DiscoverProcessID","file":"/app/internal/pkg/process/discover.go","line":64},"msg":"found process","pid":4460}
{"time":"2024-09-26T10:21:59.749486148+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/process.remoteAllocate.func1","file":"/app/internal/pkg/process/allocate.go","line":70},"msg":"Detaching from process","pid":4460}
{"time":"2024-09-26T10:21:59.749586806+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto.NewInstrumentation","file":"/app/instrumentation.go","line":113},"msg":"target process analysis completed","pid":4460,"go_version":"1.22.4","dependencies":{"github.com/Knetic/govaluate":"3.0.1-0.20171022003610-9aa49832a739+incompatible","github.com/KyleBanks/depth":"1.2.1","github.com/antonfisher/nested-logrus-formatter":"1.3.1","github.com/casbin/casbin/v2":"2.77.2","github.com/coreos/go-semver":"0.3.0","github.com/coreos/go-systemd/v22":"22.3.2","github.com/fsnotify/fsnotify":"1.6.0","github.com/gabriel-vasile/mimetype":"1.4.3","github.com/gin-contrib/sse":"0.1.0","github.com/gin-gonic/gin":"1.9.1","github.com/go-openapi/jsonpointer":"0.20.0","github.com/go-openapi/jsonreference":"0.20.2","github.com/go-openapi/spec":"0.20.9","github.com/go-openapi/swag":"0.22.4","github.com/go-playground/locales":"0.14.1","github.com/go-playground/universal-translator":"0.18.1","github.com/go-playground/validator/v10":"10.16.0","github.com/gogo/protobuf":"1.3.2","github.com/golang-jwt/jwt/v4":"4.5.0","github.com/golang/protobuf":"1.5.3","github.com/hashicorp/hcl":"1.0.0","github.com/jinzhu/inflection":"1.0.0","github.com/jinzhu/now":"1.1.5","github.com/josharian/intern":"1.0.0","github.com/leodido/go-urn":"1.2.4","github.com/magiconair/properties":"1.8.7","github.com/mailru/easyjson":"0.7.7","github.com/mattn/go-isatty":"0.0.20","github.com/mitchellh/mapstructure":"1.5.0","github.com/pelletier/go-toml/v2":"2.1.0","github.com/pkg/errors":"0.9.1","github.com/sagikazarmark/slog-shim":"0.1.0","github.com/siongui/gojianfan":"0.0.0-20210926212422-2f175ac615de","github.com/sirupsen/logrus":"1.9.3","github.com/spf13/afero":"1.10.0","github.com/spf13/cast":"1.5.1","github.com/spf13/cobra":"1.7.0","github.com/spf13/pflag":"1.0.5","github.com/spf13/viper":"1.17.0","github.com/subosito/gotenv":"1.6.0","github.com/swaggo/files":"1.0.1","github.com/swaggo/gin-swagger":"1.6.0","github.com/swaggo/swag":"1.16.2","github.com/tencentcloud/tencentcloud-sdk-go/tencentcloud/common":"1.0.993","github.com/ugorji/go/codec":"1.2.12","github.com/zput/zxcTool":"1.3.10","go.etcd.io/etcd/api/v3":"3.5.9","go.etcd.io/etcd/client/pkg/v3":"3.5.9","go.etcd.io/etcd/client/v3":"3.5.9","go.uber.org/atomic":"1.9.0","go.uber.org/multierr":"1.9.0","go.uber.org/zap":"1.21.0","golang.org/x/crypto":"0.22.0","golang.org/x/net":"0.21.0","golang.org/x/sys":"0.19.0","golang.org/x/text":"0.14.0","golang.org/x/tools":"0.16.0","google.golang.org/genproto/googleapis/api":"0.0.0-20230913181813-007df8e322eb","google.golang.org/genproto/googleapis/rpc":"0.0.0-20230920204549-e6e6cdab5c13","google.golang.org/grpc":"1.58.3","google.golang.org/protobuf":"1.31.0","gopkg.in/ini.v1":"1.67.0","gopkg.in/yaml.v3":"3.0.1","gorm.io/gorm":"1.25.5","std":"1.22.4"},"total_functions_found":9}
{"time":"2024-09-26T10:21:59.749862473+08:00","level":"INFO","source":{"function":"main.main","file":"/app/cli/main.go","line":134},"msg":"starting instrumentation..."}
{"time":"2024-09-26T10:21:59.750077678+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation.(*Manager).load","file":"/app/internal/pkg/instrumentation/manager.go","line":324},"msg":"loading probe","name":{"SpanKind":3,"InstrumentedPkg":"database/sql"}}
{"time":"2024-09-26T10:21:59.845479481+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation.(*Manager).load","file":"/app/internal/pkg/instrumentation/manager.go","line":324},"msg":"loading probe","name":{"SpanKind":3,"InstrumentedPkg":"google.golang.org/grpc"}}
{"time":"2024-09-26T10:21:59.883387251+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation.(*Manager).load","file":"/app/internal/pkg/instrumentation/manager.go","line":324},"msg":"loading probe","name":{"SpanKind":2,"InstrumentedPkg":"google.golang.org/grpc"}}
{"time":"2024-09-26T10:21:59.913460267+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation.(*Manager).load","file":"/app/internal/pkg/instrumentation/manager.go","line":324},"msg":"loading probe","name":{"SpanKind":2,"InstrumentedPkg":"net/http"}}
{"time":"2024-09-26T10:22:00.036404698+08:00","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation.(*Manager).ConfigLoop","file":"/app/internal/pkg/instrumentation/manager.go","line":238},"msg":"Configuration provider closed, configuration updates will no longer be received"}
{"time":"2024-09-26T10:22:00.036738308+08:00","level":"INFO","source":{"function":"main.main.func3","file":"/app/cli/main.go","line":130},"msg":"instrumentation loaded successfully"}
  • my process(using gin->grpc) output log:
2024/09/26 10:45:14 isTrue: true role: normal_&&^_cicd_&&^_serviceGovernance_&&^_quality_ro_&&^_order_sys_&&^_database method: GET path: /api/v1/cicd/deploy-record
2024-09-26 10:45:14 [logger.go:31] [INFO] [200] [31.145992ms] [GET] [58.250.17.2] /api/v1/cicd/deploy-record?appId=23&pageSize=4&pageIndex=1
2024-09-26 10:45:14 [logger.go:31] [INFO] [200] [13.645976ms] [GET] [58.250.17.2] /api/v1/cicd/deploy-record?appId=23&pageSize=4&pageIndex=1
panic: id (57) <= evictCount (91)

goroutine 66 [running]:
golang.org/x/net/http2/hpack.(*headerFieldTable).idToIndex(0xc000e95900, 0xc000fea570?)
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/tables.go:118 +0xbd
golang.org/x/net/http2/hpack.(*headerFieldTable).search(0xc000e95900, {{0x7fc6db0602e0, 0xb}, {0x7fc6db060300, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/tables.go:109 +0xc5
golang.org/x/net/http2/hpack.(*Encoder).searchTable(0xc000e95900, {{0x7fc6db0602e0, 0xb}, {0x7fc6db060300, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/encode.go:97 +0x85
golang.org/x/net/http2/hpack.(*Encoder).WriteField(0xc000e95900, {{0x7fc6db0602e0, 0xb}, {0x7fc6db060300, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/encode.go:62 +0x145
google.golang.org/grpc/internal/transport.(*loopyWriter).writeHeader(0xc000462fc0, 0x399, 0x0, {0xc0020ca008?, 0x10?, 0xf90820?}, 0xc000270101?)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:697 +0x11b
google.golang.org/grpc/internal/transport.(*loopyWriter).originateStream(0xc000462fc0, 0xc00228b6c0, 0xc001aa3a40)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:684 +0x6a
google.golang.org/grpc/internal/transport.(*loopyWriter).headerHandler(0xc000462fc0, 0xc001aa3a40)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:670 +0x289
google.golang.org/grpc/internal/transport.(*loopyWriter).handle(0xc000fda3c0?, {0xef3800?, 0xc001aa3a40?})
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:844 +0x211
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc000462fc0)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:556 +0x9f
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:451 +0x85
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 28
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:449 +0x22d3
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for admin
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for cmdb
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for job
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for cicd
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for message
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for security
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for monitor
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for workflow
2024-09-26 10:45:18 [rpc.go:90] [INFO] load balance enabled for database
[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

2024-09-26 10:45:23 [api.go:68] [INFO] gateway listen on 0.0.0.0:8000

2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [9.363931ms] [GET] [58.250.17.2] /api/v1/admin/user/info
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [6.437714ms] [GET] [58.250.17.2] /api/v1/admin/user/permission
2024/09/26 10:45:23 isTrue: true role: normal_&&^_cicd_&&^_quality_ro_&&^_order_sys_&&^_database method: GET path: /api/v1/database/mixed-authorized-instance
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [24.280036ms] [GET] [58.250.17.2] /api/v1/database/mixed-authorized-instance?pageIndex=1&pageSize=12&dbType=
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [24.658434ms] [GET] [58.250.17.2] /api/v1/admin/user/menu
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [17.490217ms] [GET] [58.250.17.2] /api/v1/cmdb/business/3
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [14.321949ms] [GET] [58.250.17.2] /api/v1/cmdb/service?pageIndex=1&pageSize=10&name=&businessId=3
2024-09-26 10:45:23 [logger.go:31] [INFO] [200] [33.394806ms] [GET] [58.250.17.2] /api/v1/admin/user/notify?pageIndex=1&pageSize=4
2024-09-26 10:45:24 [logger.go:31] [INFO] [200] [4.299699ms] [GET] [58.250.17.2] /api/v1/admin/user/info
2024-09-26 10:45:24 [logger.go:31] [INFO] [200] [3.590655ms] [GET] [58.250.17.2] /api/v1/admin/user/permission

Environment

  • OS: Linux 5.4.278-1.el7.elrepo.x86_64
  • Go Version: 1.22.4
  • Version: v0.14.0-alpha

To Reproduce

I don't know how to reproduce, I just know my process will panic sometime.

Expected behavior

process not panic

quanbisen avatar Sep 26 '24 02:09 quanbisen

Without a way to reproduce the issue this is likely going to be unresolvable.

What makes you think this is related to the auto-instrumentation project?

Does this still happen when using the latest versions of auto-instrumentation and grpc?

MrAlias avatar Feb 07 '25 18:02 MrAlias

reproduce

Now, I use otel/autoinstrumentation-go:v0.21.0 also has this problem with grpc v1.58.3, I deploy with docker compose, here is my docker compose file. I am sure without running go-auto-gateway, my gateway service work well, no panic. And my gateway service is an grpc client side application, panic log showing is related with http2 and grpc.

version: '3.7'

networks:
  lebo:
    driver: bridge

services:
  gateway:
    container_name: gateway
    image: gateway:latest
    restart: always
    command: "/app/gateway -c settings.yml"
    environment:
      - TZ=Asia/Shanghai
    ports:
      - "8001:8000"
    networks:
      - lebo
    volumes:
      - go-gateway-vol:/app

  go-auto-gateway:
    container_name: go-auto-gateway
    image: otel/autoinstrumentation-go:v0.21.0
    restart: always
    command: /otel-go-instrumentation
    pid: "host"
    privileged: true
    environment:
      - TZ=Asia/Shanghai
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel.lebo.lc:4318
      - OTEL_GO_AUTO_TARGET_EXE=/app/gateway
      - OTEL_SERVICE_NAME=lebo-ops-gateway
      - OTEL_GO_AUTO_INCLUDE_DB_STATEMENT=true
      - OTEL_PROPAGATORS=tracecontext,baggage
    networks:
      - lebo
    volumes:
      - go-gateway-vol:/app
      - /proc:/host/proc
volumes:
  go-gateway-vol:

panic log output:

2025/04/29 05:37:54 isTrue: true role: normal_&&^_cicd_&&^_quality_ro_&&^_order_sys_&&^_database method: GET path: /api/v1/cicd/build-record
panic: id (157) <= evictCount (158)

goroutine 115 [running]:
golang.org/x/net/http2/hpack.(*headerFieldTable).idToIndex(0xc000f03a80, 0xc0000a88d0?)
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/tables.go:118 +0xbd
golang.org/x/net/http2/hpack.(*headerFieldTable).search(0xc000f03a80, {{0x7f950bbb7838, 0xb}, {0x7f950bbb7858, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/tables.go:109 +0xc5
golang.org/x/net/http2/hpack.(*Encoder).searchTable(0xc000f03a80, {{0x7f950bbb7838, 0xb}, {0x7f950bbb7858, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/encode.go:97 +0x85
golang.org/x/net/http2/hpack.(*Encoder).WriteField(0xc000f03a80, {{0x7f950bbb7838, 0xb}, {0x7f950bbb7858, 0x37}, 0x0})
        /go/pkg/mod/golang.org/x/[email protected]/http2/hpack/encode.go:62 +0x145
google.golang.org/grpc/internal/transport.(*loopyWriter).writeHeader(0xc00017a9a0, 0x149, 0x0, {0xc0015cefc8?, 0x10?, 0xfd3c20?}, 0xc001e1c701?)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:697 +0x11b
google.golang.org/grpc/internal/transport.(*loopyWriter).originateStream(0xc00017a9a0, 0xc0019e75c0, 0xc001a42f00)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:684 +0x6a
google.golang.org/grpc/internal/transport.(*loopyWriter).headerHandler(0xc00017a9a0, 0xc001a42f00)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:670 +0x289
google.golang.org/grpc/internal/transport.(*loopyWriter).handle(0xc000f0bc20?, {0xf2ce20?, 0xc001a42f00?})
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:844 +0x211
google.golang.org/grpc/internal/transport.(*loopyWriter).run(0xc00017a9a0)
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/controlbuf.go:556 +0x9f
google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:451 +0x85
created by google.golang.org/grpc/internal/transport.newHTTP2Client in goroutine 77
        /go/pkg/mod/google.golang.org/[email protected]/internal/transport/http2_client.go:449 +0x22d3

quanbisen avatar Apr 29 '25 05:04 quanbisen

It seems like go http2 bugs. https://github.com/golang/go/issues/43051, but without running otel/autoinstrumentation-go, my service run well.

quanbisen avatar Apr 29 '25 06:04 quanbisen

cc @RonFed @grcevski

MrAlias avatar Apr 29 '25 14:04 MrAlias

Related to https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/139

MrAlias avatar May 01 '25 16:05 MrAlias

Is there anyone can fix it? I'm not good at eBPF and c develop. I have try to use cursor AI coding, but not fix it. This bug occur GRPC client call to server sometime.

quanbisen avatar Oct 28 '25 01:10 quanbisen

Hey, I had created a PR for this bug. https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/2981

quanbisen avatar Oct 30 '25 05:10 quanbisen