client_python icon indicating copy to clipboard operation
client_python copied to clipboard

Eagerly convert value to float in *MetricFamily.add_metric()

Open mhansen opened this issue 5 years ago • 7 comments

Hi there, an experience report and improvement suggestion.

I have a pretty simple exporter which queries a backend URL for JSON, grabs the JSON, stuffs it into prometheus metric families, which it then yields.

Sometimes it errors with stacktraces only when the request is finishing, with a stack traces that points at finish_request rather than the code that added the bad float:

TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/socketserver.py", line 625, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/local/lib/python3.5/http/server.py", line 422, in handle
    self.handle_one_request()
  File "/usr/local/lib/python3.5/http/server.py", line 410, in handle_one_request
    method()
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/exposition.py", line 152, in do_GET
    output = encoder(registry)
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/openmetrics/exposition.py", line 56, in generate_latest
    floatToGoString(s.value),
  File "/usr/local/lib/python3.5/site-packages/prometheus_client/utils.py", line 8, in floatToGoString
    d = float(d)
TypeError: ("float() argument must be a string or a number, not 'NoneType'", Metric(bom_wind_speed, Wind speed (km/h) from the Bureau of Meterology, gauge, , [Sample(name='bom_wind_speed', labels={'location': 'Sydney Airport'}, value=None, timestamp=None, exemplar=None), Sample(name='bom_wind_speed', labels={'location': 'Sydney - Observatory Hill'}, value=20, timestamp=None, exemplar=None)]))

I've hit this kind of bug a few times in different exporters (I guess it's to be expected to get type errors in Python sometimes).

How about eagerly converting the value passed to add_metric to a float? Then the stack trace would point at the exact cause.

This might be a breaking change - that'd be a reasonable reason to reject this. But any code doing this will likely fail soon after, as soon as an attempt is made to serialize the metric, so maybe it'd be worth the change for better debuggability? What do you think?

mhansen avatar Mar 12 '20 11:03 mhansen

That sounds reasonable to me. I don't see how it'd be breaking, you're only changing when we're erroring on bad data.

brian-brazil avatar Mar 12 '20 11:03 brian-brazil

It’d only break if someone was creating a metric but never turning it into a string. Probably unlikely, but maybe someone is holding them in memory and not outputting them? Seems a bit unlikely.

On Thu, 12 Mar 2020 at 22:46, Brian Brazil [email protected] wrote:

That sounds reasonable to me. I don't see how it'd be breaking, you're only changing when we're erroring on bad data.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598143269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOLVOHAXZYMRVE6BX2DRHDDPRANCNFSM4LGJ4OGA .

mhansen avatar Mar 12 '20 11:03 mhansen

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

brian-brazil avatar Mar 12 '20 11:03 brian-brazil

Having a quick look into this, looks like a common pinchpoint we could add this check would be the Sample constructor.

But it's a namedtuple. In python3 there's typing.NamedTuple, but that doesn't verify types at runtime.

Looks like it's not easy to override the constructor of a namedtuple, except by subclassing or having a factory function wrapper. Subclassing looks like a better win, preserving the Sample function as a type (which some folks might depend on), as written here: https://stackoverflow.com/a/16721002/171898

On Thu, 12 Mar 2020 at 22:52, Brian Brazil [email protected] wrote:

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598145595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOO7EABQZQ4NWYNKKDLRHDEH7ANCNFSM4LGJ4OGA .

mhansen avatar Mar 19 '20 09:03 mhansen

Could always write out the entire generated code body of the namedtuple and then change the constructor too! Not ideal for sure.

On Thu, 19 Mar 2020 at 20:42, Mark Hansen [email protected] wrote:

Having a quick look into this, looks like a common pinchpoint we could add this check would be the Sample constructor.

But it's a namedtuple. In python3 there's typing.NamedTuple, but that doesn't verify types at runtime.

Looks like it's not easy to override the constructor of a namedtuple, except by subclassing or having a factory function wrapper. Subclassing looks like a better win, preserving the Sample function as a type (which some folks might depend on), as written here: https://stackoverflow.com/a/16721002/171898

On Thu, 12 Mar 2020 at 22:52, Brian Brazil [email protected] wrote:

They're useless without being output at some point, and even if it did break someone I'd be okay with that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-598145595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOO7EABQZQ4NWYNKKDLRHDEH7ANCNFSM4LGJ4OGA .

mhansen avatar Mar 19 '20 09:03 mhansen

The quickest win is probably the add_metric function itself, but wrapping the constructor should also work.

brian-brazil avatar Mar 19 '20 09:03 brian-brazil

Yeah add_metric would be quick... but looks like there are 8 implementations of add_metric which makes me feel a bit awkward. Good fallback option, trying to wrap the constructor now.

On Thu, 19 Mar 2020 at 20:58, Brian Brazil [email protected] wrote:

The quickest win is probably the add_metric function itself, but wrapping the constructor should also work.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/prometheus/client_python/issues/527#issuecomment-601090072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAZYOJAXY25KKYQYSAHCGLRIHUD3ANCNFSM4LGJ4OGA .

mhansen avatar Mar 19 '20 10:03 mhansen