temporal-operator
temporal-operator copied to clipboard
Linkerd RST_STREAM protocol issues
I've launched a temporal cluster with the linkerd option and have been running into grpc errors. I don't have issues with regards to non-linkerd deployments.
TEMPORAL_ADDRESS is not set, setting it to 10.8.21.62:7233
2023/10/17 22:29:19 Loading config; env=docker,zone=,configDir=config
2023/10/17 22:29:19 Loading config files=[config/docker.yaml]
"level":"info","ts":"2023-10-17T22:29:19.304Z","msg":"Build info.","git-time":"2023-08-14T17:17:11.000Z","git-revision":"c79c00ac8f96c94dffae6e59eba9a279f7ebc656","git-modified":true,"go-arch":"amd64","go-os":"linux","go-version":"go1.20.6","cgo-enabled":false,"server-version":"1.21.5","debug-mode":false,"logging-call-at":"main.go:148"}
{"level":"info","ts":"2023-10-17T22:29:19.304Z","msg":"Dynamic config client is not configured. Using noop client.","logging-call-at":"main.go:168"}
{"level":"warn","ts":"2023-10-17T22:29:19.304Z","msg":"Not using any authorizer and flag `--allow-no-auth` not detected. Future versions will require using the flag `--allow-no-auth` if you do not want to set an authorizer.","logging-call-at":"main.go:178"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Use rpc address 127.0.0.1:7233 for cluster prod.","component":"metadata-initializer","logging-call-at":"fx.go:840"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"history","logging-call-at":"fx.go:371"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"matching","logging-call-at":"fx.go:421"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"frontend","logging-call-at":"fx.go:479"}
{"level":"info","ts":"2023-10-17T22:29:19.330Z","msg":"Service is not requested, skipping initialization.","service":"internal-frontend","logging-call-at":"fx.go:479"}
{"level":"info","ts":"2023-10-17T22:29:19.347Z","msg":"historyClient: ownership caching disabled","service":"worker","logging-call-at":"client.go:82"}
{"level":"info","ts":"2023-10-17T22:29:19.348Z","msg":"PProf not started due to port not set","logging-call-at":"pprof.go:67"}
{"level":"info","ts":"2023-10-17T22:29:19.348Z","msg":"Starting server for services","value":{"worker":{}},"logging-call-at":"server_impl.go:88"}
{"level":"info","ts":"2023-10-17T22:29:19.363Z","msg":"RuntimeMetricsReporter started","service":"worker","logging-call-at":"runtime.go:138"}
{"level":"info","ts":"2023-10-17T22:29:19.363Z","msg":"worker starting","service":"worker","component":"worker","logging-call-at":"service.go:391"}
{"level":"info","ts":"2023-10-17T22:29:19.369Z","msg":"Membership heartbeat upserted successfully","address":"10.8.21.62","port":6939,"hostId":"9cef8afb-6d3c-11ee-93a5-560035998413","logging-call-at":"monitor.go:256"}
{"level":"info","ts":"2023-10-17T22:29:19.371Z","msg":"bootstrap hosts fetched","bootstrap-hostports":"10.8.11.88:6934,10.8.6.148:6933,10.8.21.62:6939,10.8.20.37:6935","logging-call-at":"monitor.go:298"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"matching","addresses":["10.8.20.37:7235"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"worker","addresses":["10.8.21.62:7239"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"frontend","addresses":["10.8.6.148:7233"],"logging-call-at":"service_resolver.go:279"}
{"level":"info","ts":"2023-10-17T22:29:19.377Z","msg":"Current reachable members","component":"service-resolver","service":"history","addresses":["10.8.11.88:7234"],"logging-call-at":"service_resolver.go:279"}
{"level":"warn","ts":"2023-10-17T22:29:19.380Z","msg":"error creating sdk client","service":"worker","error":"failed reaching server: stream terminated by RST_STREAM with error code: PROTOCOL_ERROR","logging-call-at":"factory.go:114"}
{"level":"fatal","ts":"2023-10-17T22:29:19.380Z","msg":"error creating sdk client","service":"worker","error":"failed reaching server: stream terminated by RST_STREAM with error code: PROTOCOL_ERROR","logging-call-at":"factory.go:121","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Fatal\n\t/home/builder/temporal/common/log/zap_logger.go:180\ngo.temporal.io/server/common/sdk.(*clientFactory).GetSystemClient.func1\n\t/home/builder/temporal/common/sdk/factory.go:121\nsync.(*Once).doSlow\n\t/usr/local/go/src/sync/once.go:74\nsync.(*Once).Do\n\t/usr/local/go/src/sync/once.go:65\ngo.temporal.io/server/common/sdk.(*clientFactory).GetSystemClient\n\t/home/builder/temporal/common/sdk/factory.go:108\ngo.temporal.io/server/service/worker/scanner.(*Scanner).Start\n\t/home/builder/temporal/service/worker/scanner/scanner.go:229\ngo.temporal.io/server/service/worker.(*Service).startScanner\n\t/home/builder/temporal/service/worker/service.go:523\ngo.temporal.io/server/service/worker.(*Service).Start\n\t/home/builder/temporal/service/worker/service.go:408\ngo.temporal.io/server/service/worker.ServiceLifetimeHooks.func1.1\n\t/home/builder/temporal/service/worker/fx.go:136"}
This seems like a GRPC issue.
My temporal cluster:
apiVersion: v1
items:
- apiVersion: temporal.io/v1beta1
kind: TemporalCluster
metadata:
name: prod
namespace: temporal-system
spec:
version: 1.21.5
admintools:
enabled: true
jobTtlSecondsAfterFinished: 300
log:
development: false
format: json
level: info
outputFile: ""
stdout: true
mTLS:
provider: linkerd
refreshInterval: 5m0s
numHistoryShards: 1
persistence:
defaultStore:
name: default
passwordSecretRef:
key: PASSWORD
name: postgres-password
skipCreate: false
sql:
connectAddr: <rds_instance>.<rds_region>.rds.amazonaws.com:5432
connectProtocol: tcp
databaseName: temporal
maxConnLifetime: 0s
maxConns: 0
maxIdleConns: 0
pluginName: postgres
taskScanPartitions: 0
user: <rds_user>
visibilityStore:
elasticsearch:
closeIdleConnectionsInterval: 0s
enableHealthcheck: false
enableSniff: false
indices:
secondaryVisibility: ""
visibility: temporal_visibility
logLevel: ""
url: https://<opensearch url>
username: admin
version: v7
name: visibility
passwordSecretRef:
key: PASSWORD
name: opensearch-password
skipCreate: false
ui:
enabled: true
status:
conditions:
- lastTransitionTime: "2023-10-17T22:18:24Z"
message: ""
observedGeneration: 1
reason: ServicesNotReady
status: "False"
type: Ready
- lastTransitionTime: "2023-10-17T22:18:24Z"
message: ""
observedGeneration: 1
reason: LastReconcileCycleSucceded
status: "True"
type: ReconcileSuccess
persistence:
defaultStore:
created: true
schemaVersion: 1.21.5
setup: true
type: postgres
visibilityStore:
created: true
schemaVersion: 1.21.5
setup: true
type: elasticsearch
services:
- name: frontend
ready: true
version: 1.21.5
- name: history
ready: true
version: 1.21.5
- name: matching
ready: true
version: 1.21.5
- name: worker
ready: false
version: 1.21.5
version: 1.21.5
kind: List
metadata:
resourceVersion: ""
Hi!
I don't think this is related to the operator. The operator does nothing than asking linkerd to inject its sidecar.
Which linkerd
version are you using ?