dshackle icon indicating copy to clipboard operation
dshackle copied to clipboard

Dshackle grafana dashboard doesn't have all the graphs filled up with data

Open PDVJAM opened this issue 2 years ago • 12 comments

Hey! We use Dshackle on k8s, installed by the helm. For monitoring we use Prometheus. And it works, but partially - in the Grafana I see the number of upstreams, whether they're ok or not, but graphs like 'JSON RPC total request / failed requests', 'JSON RPC OK requests by method', error requests by method, response time upstream errors, total counts are empty. What actually can be wrong and how to fix so all graphs show actual data? Grafana dashboard copied without any changes from this repo. dshackle version is 0.12.

dshackle monitoring config part:

  monitoring:
    enabled: true
    jvm: true
    extended: true 
    prometheus:
      enabled: true
      bind: 0.0.0.0
      port: 8081
      path: /metrics

serviceMonitor helm values:

serviceMonitor:
  # -- If true, a ServiceMonitor CRD is created for a prometheus operator
  # https://github.com/coreos/prometheus-operator
  enabled: true
  # -- Path to scrape
  path: /metrics
  # -- Alternative namespace for ServiceMonitor
  namespace: null
  # -- Additional ServiceMonitor labels
  labels:
    release: kube-prometheus-stack
  # -- Additional ServiceMonitor annotations
  annotations: {}
  # -- ServiceMonitor scrape interval
  interval: 1m
  # -- ServiceMonitor scheme
  scheme: http
  # -- ServiceMonitor TLS configuration
  tlsConfig: {}
  # -- ServiceMonitor scrape timeout
  scrapeTimeout: 30s
  # -- ServiceMonitor relabelings
  relabelings: []

PDVJAM avatar Jul 03 '22 07:07 PDVJAM

What kind of requests do you make to Dshackle?

splix avatar Jul 03 '22 15:07 splix

What kind of requests do you make to Dshackle?

RPC, eth_blockNumber, eth_getBalance. BTW, when Dshackle was deployed via docker-compose - most of now empty graphs were populated with data. Requests were the same.

PDVJAM avatar Jul 04 '22 06:07 PDVJAM

Maybe you scrape a wrong target?

splix avatar Jul 04 '22 14:07 splix

Maybe you scrape a wrong target?

Isn't in that case graph will be completely empty, not partially?..

PDVJAM avatar Jul 04 '22 14:07 PDVJAM

Maybe some of request go thought one instance, some through another.

I don't really know anything about your setup, I'm just trying to guess. If you think there is some bug I need something to reproduce it. Ideally a command or script that demonstrates the problem that I can run.

splix avatar Jul 04 '22 14:07 splix

Maybe some of request go thought one instance, some through another.

I don't really know anything about your setup, I'm just trying to guess. If you think there is some bug I need something to reproduce it. Ideally a command or script that demonstrates the problem that I can run.

I don't have anything custom. I use skylenet dshackle helm chart, parts of configs I already provided in the first message, plus Prometheus+Grafana installed wo customization from the kube-kube-prometheus-stack official helm chart. And the dashboard from this repo.

I have actually found that if I keep servicemonitor disabled in dshackle helm but scrape metrics directly from dshackle monitoring scv port - almost all graphs are populated with data.

PDVJAM avatar Jul 06 '22 11:07 PDVJAM

@PDVJAM have you tried using the metrics explorer on grafana to see if those metrics are there? e.g. I see that the grafana dashboard relies on some labels: "instance" and "chain" that should be part of most of the metrics. It could be that these are missing, and therefore the graphs are not showing up. To make this more clear, maybe it's worth it to also manually make a GET request against the metrics endpoint to see how data looks like.

skylenet avatar Jul 25 '22 08:07 skylenet

@skylenet @splix Hey. Actually, even when I scrape metrics directly from dshackle monitoring port - sometimes the dashboard shows data, sometimes not. dshackle.yaml monitoring part:

  monitoring:
    enabled: true
    jvm: true
    extended: true
    prometheus:
      enabled: true
      bind: 0.0.0.0
      port: 8081
      path: /metrics

prometheus scrape config:

    additionalScrapeConfigs:
      - job_name: dshackle
        static_configs:
          - targets: ['dshackle.ds.svc.cluster.local:8081']

I see exported metrics in grafana exporter, for example. And I also see metrics on the port:

/ $ curl http://dshackle.ds.svc.cluster.local:8081/metrics
# HELP jvm_classes_loaded_classes The number of classes that are currently loaded in the Java virtual machine
# TYPE jvm_classes_loaded_classes gauge
jvm_classes_loaded_classes 11709.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of major GC",cause="Metadata GC Threshold",} 2.0
jvm_gc_pause_seconds_sum{action="end of major GC",cause="Metadata GC Threshold",} 0.234
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of major GC",cause="Metadata GC Threshold",} 0.0
# HELP dshackle_request_grpc_fail_total Number of requests failed to process
# TYPE dshackle_request_grpc_fail_total counter
dshackle_request_grpc_fail_total 0.0

But the dashboard is mostly empty, despite the fact that it sees two active upstream ETH and RINKEBY: screenshot

Will appreciate any help.

PDVJAM avatar Jul 26 '22 15:07 PDVJAM

@PDVJAM what happens if you select the instance on the grafana dashboard? On your screenshot I see that it currently has "none"

skylenet avatar Jul 26 '22 15:07 skylenet

@PDVJAM what happens if you select the instance on the grafana dashboard? On your screenshot I see that it currently has "none"

It is none, there is no instance somehow. Sometimes I see in this selector dshackle.ds.svc.cluster.local:8081 and then everything works, sometimes there is just 'none'.

PDVJAM avatar Jul 26 '22 16:07 PDVJAM

@skylenet Any ideas?

PDVJAM avatar Jul 30 '22 04:07 PDVJAM

@PDVJAM I currently don't have a running dshackle/prometheus/grafana so it would take me some time to set something up. But can you try changing the variable query for the instance selector? You go to the Grafana Dshackle Dashboard > Settings > Variables > Select "instance" variable

And then change the query to label_values(dshackle_upstreams_availability{},instance) . Save the dashboard and reload everything.

skylenet avatar Aug 01 '22 15:08 skylenet

@PDVJAM I currently don't have a running dshackle/prometheus/grafana so it would take me some time to set something up. But can you try changing the variable query for the instance selector? You go to the Grafana Dshackle Dashboard > Settings > Variables > Select "instance" variable

And then change the query to label_values(dshackle_upstreams_availability{},instance) . Save the dashboard and reload everything.

Yep, it does work with label_values(dshackle_upstreams_availability{},instance).

PDVJAM avatar Dec 04 '22 09:12 PDVJAM