prometheus_exporter icon indicating copy to clipboard operation
prometheus_exporter copied to clipboard

Adds handling for malformed metrics

Open baweaver opened this issue 8 months ago • 1 comments

In the Ruby GraphQL gem we have observed some interesting behavior where collectors appear not to be registered, which causes the code from that gem to error:

@trace.prometheus_client.send_json(
  # [!!] No name here
  type: @trace.prometheus_collector_type,
  duration: duration,
  platform_key: event_name,
  key: keyword
)

Source: https://github.com/rmosolgo/graphql-ruby/blob/ddf2550a204be69ba681739b529324a074d72c91/lib/graphql/tracing/prometheus_trace.rb#L68

On one hand this appears to be a malformed request body as-is, and the tests do not appear to test the integration in validating this behavior.

Given that, that brings up a potential edge case which may warrant some discussion: What should PrometheusExporter do in a case where it gets a malformed metric that does not have a name? Should it give an error? Right now what it returns is this:

#nomethoderror: undefined method `observe' for nil:NilClass - /opt/ruby3.0/lib/ruby/gems/3.0.0/gems/prometheus_exporter-0.8.1/lib/prometheus_exporter/server/collector.rb:55:in `block in process_hash'

...which does not give clear traceability into the cause and potential resolutions.

The opinion of this PR, at least, is to allow it to gracefully fail when a metric cannot be registered but I am not convinced that this is the correct behavior in this scenario and would defer to the folks working on the project as to what makes the most sense here.

My suspicion with the upstream issue is that it's some form of race condition in loading collectors.

Documented some things on the GraphQL gem:

https://github.com/rmosolgo/graphql-ruby/issues/5323

Still suspecting some type of loading or race condition in here.

baweaver avatar Apr 08 '25 22:04 baweaver

One other potential strategy that would additionally remove the ObjectSpace invocations would be to use self.inherited for a registry instead, but that'd have some consequences:

  • server.rb would be more order-dependent
  • Given the usage of mutex it may not be great to CollectorChild.new and cache that in an effective singleton

...but it'd also make it easier for any downstream consumers making custom collectors.

If that is of interest I can get a PR up on that later.

baweaver avatar Apr 09 '25 04:04 baweaver

Awesome, yes please do

On Wed, 6 Aug 2025 at 3:11 pm, Brandon Weaver @.***> wrote:

@.**** commented on this pull request.

In lib/prometheus_exporter/server/collector.rb https://github.com/discourse/prometheus_exporter/pull/337#discussion_r2255878732 :

@@ -74,6 +76,11 @@ def register_metric_unsafe(obj) help = obj["help"] opts = symbolize_keys(obj["opts"] || {})

  •  if !name
    
  •    STDERR.puts "failed to register metric due to empty name #{obj}"
    

Ah. L99:

STDERR.puts "failed to register metric #{obj}"

I can switch both? I think I was probably just miming whatever conventions I saw at the time.

— Reply to this email directly, view it on GitHub https://github.com/discourse/prometheus_exporter/pull/337#discussion_r2255878732, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABIXNKEBG6VFM77COMG7D3MGFARAVCNFSM6AAAAAB2XPK6U2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTAOJQGQZTIMJVHE . You are receiving this because you commented.Message ID: @.***>

SamSaffron avatar Aug 06 '25 05:08 SamSaffron

Changes made, as per request, happy to modify to suite. TL;DR: Piped the logger down from the PrometheusExporter::Server into the collector as an argument

baweaver avatar Aug 06 '25 05:08 baweaver

I think an overkill test is probably fine. CI should be back in action soon, we have a PR for that.

SamSaffron avatar Aug 06 '25 05:08 SamSaffron