zilla icon indicating copy to clipboard operation
zilla copied to clipboard

feat: improve troubleshooting capabilities

Open hedhyw opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. It's difficult to troubleshoot configuration with the current capabilities.

Describe the solution you'd like More log events.

Additional context I've run into a problem with the configuration based on the grpc.kafka.proxy example.

It works perfectly fine with a local Kafka and partly works with a remote Kafka when SASL is enabled.

To summarize:

Kafka Behaviour config
Local Kafka-gRPC + gRPC-Kafka are working fine zilla.yaml
Remote only gRPC-Kafka produces messages, but kafka-grpc doesn't consume zilla.yaml

The configs are identical except SASL part:

80a81,85
>     options:
>       sasl:
>         mechanism: scram-sha-512
>         username: "******"
>         password: "******"
88c93
<       host: kafka
---
>       host: remote.kafka.host.example.com

All topics exist and I can consume from them using kcat (so permissions are OK). In case of remote configuration, I see messages in the topic api.exampleserviceproto.requests.v1, but api.exampleserviceproto.responses.v1 is empty.

Any ideas how to debug this problem or what could be wrong?

I've tried to enabled stdout exporter, but it didn't show any anything except REQUEST_ACCEPTED logs.

In metrics, I see that, there're some receive errors from Kafka client:

# HELP stream_errors_received_total Number of errors on received streams
# TYPE stream_errors_received_total counter
stream_errors_received_total{namespace="zilla-quickstart",binding="south_kafka_tcp_client"} 2

but sends are OK:

# HELP stream_opens_sent_total Number of opened sent streams
# TYPE stream_opens_sent_total counter
stream_opens_sent_total{namespace="zilla-quickstart",binding="south_kafka_tcp_client"} 5

Also kafka-grpc binding is silent (west_kafka_grpc_remote_server):

# HELP stream_data_received_bytes_total Bytes of data on received streams
# TYPE stream_data_received_bytes_total counter
stream_data_received_bytes_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_data_sent_bytes_total Bytes of data on sent streams
# TYPE stream_data_sent_bytes_total counter
stream_data_sent_bytes_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_opens_received_total Number of opened received streams
# TYPE stream_opens_received_total counter
stream_opens_received_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_opens_sent_total Number of opened sent streams
# TYPE stream_opens_sent_total counter
stream_opens_sent_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_closes_received_total Number of closed received streams
# TYPE stream_closes_received_total counter
stream_closes_received_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_closes_sent_total Number of closed sent streams
# TYPE stream_closes_sent_total counter
stream_closes_sent_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_errors_received_total Number of errors on received streams
# TYPE stream_errors_received_total counter
stream_errors_received_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_errors_sent_total Number of errors on sent streams
# TYPE stream_errors_sent_total counter
stream_errors_sent_total{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_active_received Number of currently active received streams
# TYPE stream_active_received gauge
stream_active_received{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

# HELP stream_active_sent Number of currently active sent streams
# TYPE stream_active_sent gauge
stream_active_sent{namespace="zilla-quickstart",binding="west_kafka_grpc_remote_server"} 0

hedhyw avatar Apr 09 '24 05:04 hedhyw

@hedhyw thank you for posting this, it may be related to known bug https://github.com/aklivity/zilla/issues/881 that is currently being worked on.

It would also be helpful to analyze the zilla dump pcap for your specific use case.

Please start zilla afresh and reproduce the reported issue, then exec into the docker container where zilla is running and perform the following command:

ZILLA_INCUBATOR_ENABLED=true /opt/zilla/zilla dump -v -w /tmp/zilla.dump.pcap

Then copy the file out of the container and upload here for review.

jfallows avatar Apr 18 '24 00:04 jfallows

thank you!

zilla.dump.tar.gz

hedhyw avatar Apr 18 '24 06:04 hedhyw