dshackle
dshackle copied to clipboard
Dshackle grafana dashboard doesn't have all the graphs filled up with data
Hey! We use Dshackle on k8s, installed by the helm. For monitoring we use Prometheus. And it works, but partially - in the Grafana I see the number of upstreams, whether they're ok or not, but graphs like 'JSON RPC total request / failed requests', 'JSON RPC OK requests by method', error requests by method, response time upstream errors, total counts are empty. What actually can be wrong and how to fix so all graphs show actual data? Grafana dashboard copied without any changes from this repo. dshackle version is 0.12.
dshackle monitoring config part:
monitoring:
enabled: true
jvm: true
extended: true
prometheus:
enabled: true
bind: 0.0.0.0
port: 8081
path: /metrics
serviceMonitor helm values:
serviceMonitor:
# -- If true, a ServiceMonitor CRD is created for a prometheus operator
# https://github.com/coreos/prometheus-operator
enabled: true
# -- Path to scrape
path: /metrics
# -- Alternative namespace for ServiceMonitor
namespace: null
# -- Additional ServiceMonitor labels
labels:
release: kube-prometheus-stack
# -- Additional ServiceMonitor annotations
annotations: {}
# -- ServiceMonitor scrape interval
interval: 1m
# -- ServiceMonitor scheme
scheme: http
# -- ServiceMonitor TLS configuration
tlsConfig: {}
# -- ServiceMonitor scrape timeout
scrapeTimeout: 30s
# -- ServiceMonitor relabelings
relabelings: []
What kind of requests do you make to Dshackle?
What kind of requests do you make to Dshackle?
RPC, eth_blockNumber, eth_getBalance. BTW, when Dshackle was deployed via docker-compose - most of now empty graphs were populated with data. Requests were the same.
Maybe you scrape a wrong target?
Maybe you scrape a wrong target?
Isn't in that case graph will be completely empty, not partially?..
Maybe some of request go thought one instance, some through another.
I don't really know anything about your setup, I'm just trying to guess. If you think there is some bug I need something to reproduce it. Ideally a command or script that demonstrates the problem that I can run.
Maybe some of request go thought one instance, some through another.
I don't really know anything about your setup, I'm just trying to guess. If you think there is some bug I need something to reproduce it. Ideally a command or script that demonstrates the problem that I can run.
I don't have anything custom. I use skylenet dshackle helm chart, parts of configs I already provided in the first message, plus Prometheus+Grafana installed wo customization from the kube-kube-prometheus-stack official helm chart. And the dashboard from this repo.
I have actually found that if I keep servicemonitor disabled in dshackle helm but scrape metrics directly from dshackle monitoring scv port - almost all graphs are populated with data.
@PDVJAM have you tried using the metrics explorer on grafana to see if those metrics are there? e.g. I see that the grafana dashboard relies on some labels: "instance" and "chain" that should be part of most of the metrics. It could be that these are missing, and therefore the graphs are not showing up. To make this more clear, maybe it's worth it to also manually make a GET request against the metrics endpoint to see how data looks like.
@skylenet @splix Hey. Actually, even when I scrape metrics directly from dshackle monitoring port - sometimes the dashboard shows data, sometimes not. dshackle.yaml monitoring part:
monitoring:
enabled: true
jvm: true
extended: true
prometheus:
enabled: true
bind: 0.0.0.0
port: 8081
path: /metrics
prometheus scrape config:
additionalScrapeConfigs:
- job_name: dshackle
static_configs:
- targets: ['dshackle.ds.svc.cluster.local:8081']
I see exported metrics in grafana exporter, for example. And I also see metrics on the port:
/ $ curl http://dshackle.ds.svc.cluster.local:8081/metrics
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes 11709.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of major GC",cause="Metadata GC Threshold",} 2.0
jvm_gc_pause_seconds_sum{action="end of major GC",cause="Metadata GC Threshold",} 0.234
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of major GC",cause="Metadata GC Threshold",} 0.0
# HELP dshackle_request_grpc_fail_total Number of requests failed to process
# TYPE dshackle_request_grpc_fail_total counter
dshackle_request_grpc_fail_total 0.0
But the dashboard is mostly empty, despite the fact that it sees two active upstream ETH and RINKEBY: screenshot
Will appreciate any help.
@PDVJAM what happens if you select the instance on the grafana dashboard? On your screenshot I see that it currently has "none"
@PDVJAM what happens if you select the instance on the grafana dashboard? On your screenshot I see that it currently has "none"
It is none, there is no instance somehow. Sometimes I see in this selector dshackle.ds.svc.cluster.local:8081 and then everything works, sometimes there is just 'none'.
@skylenet Any ideas?
@PDVJAM I currently don't have a running dshackle/prometheus/grafana so it would take me some time to set something up. But can you try changing the variable query for the instance selector? You go to the Grafana Dshackle Dashboard > Settings > Variables > Select "instance" variable
And then change the query to label_values(dshackle_upstreams_availability{},instance)
. Save the dashboard and reload everything.
@PDVJAM I currently don't have a running dshackle/prometheus/grafana so it would take me some time to set something up. But can you try changing the variable query for the instance selector? You go to the Grafana Dshackle Dashboard > Settings > Variables > Select "instance" variable
And then change the query to
label_values(dshackle_upstreams_availability{},instance)
. Save the dashboard and reload everything.
Yep, it does work with label_values(dshackle_upstreams_availability{},instance).