neptune-client icon indicating copy to clipboard operation
neptune-client copied to clipboard

Not possible to send infs and nans

Open wjaskowski opened this issue 5 years ago • 18 comments

What is the reason to don't allow sending inf's and nan's as metric values? I imagine that it is impossible to plot them but this is still some information.

wjaskowski avatar Oct 02 '19 14:10 wjaskowski

Hi @wjaskowski thanks for reaching out.

Indeed right now we do not accept NaN/None/(+/-)Inf values.

However, we had some internal discussion about it.

One idea is to make it similar to what TensorBoard does: each Nan/None is displayed as a graphic icon, like a triangle or star. Location on "y" axis is determined by preceding numeric value. Location on "x" axis is preceding value +1.

what do you think?

kamil-kaczmarek avatar Oct 04 '19 13:10 kamil-kaczmarek

Sounds good. You might also want to consider placing those triangles on the bottom/top of the visible plot.

But visualization is one thing - the most important one is to be able to send and download the data.

wjaskowski avatar Oct 04 '19 13:10 wjaskowski

Thanks for suggestion :slightly_smiling_face:

We will consider it as well.

kamil-kaczmarek avatar Oct 04 '19 13:10 kamil-kaczmarek

This is a somewhat stale issue but the first thing that comes up when you google the behaviour.

Any news/updates on this?

fwindolf avatar May 22 '23 09:05 fwindolf

Hello @fwindolf , This feature request is quite deep in our backlog, so currently, there is no ETA for it, unfortunately. Is this behavior a blocker for your workflows?

SiddhantSadangi avatar May 22 '23 11:05 SiddhantSadangi

Not really a blocker, but having NaNs occur during training for whatever reason seems to be a common enough problem to justify experiment tracking not completely breaking imo.

So a

run["my_metric"].append(1.0)
run["my_metric"].append(float("nan"))
run["my_metric"].append(3.0)

will only show the 1.0. I see why adding NaN support would open up quite a few edge cases for visualizations etc, but maybe a short term fix could be simply ignoring NaN, +-inf etc during the list iteration when syncing the metric.

fwindolf avatar May 22 '23 21:05 fwindolf

Would replacing the nan/inf values with 0/some high-end value while logging be a viable workaround in your case? Something like:

import math
metric = float("nan")
if math.isna(metric ):
    run["my_metric'].append(0)

I've also submitted your feedback around ignoring NaN/inf to the product team. Thank you :)

SiddhantSadangi avatar May 23 '23 09:05 SiddhantSadangi

Hello @fwindolf , Just checking if the above workaround works for you

SiddhantSadangi avatar Jun 01 '23 14:06 SiddhantSadangi

Sorry I missed the notification of the last comment.

We solved it by not logging nans as 0, inf as a big number which is okay for now. It skews the readability of graphs but it's better than not seeing anything.

Thanks for forwarding the issue!

fwindolf avatar Jun 01 '23 18:06 fwindolf

Did you mean ~not~ logging nans as 0? :)

SiddhantSadangi avatar Jun 01 '23 19:06 SiddhantSadangi

Hi there! Is this something that is actively worked on? I experienced some solid trouble recently because my training process diverged and the logging did not show where the NANs started to show up at first. This can be very valuable information for debugging. What is bad about the way e.g. tensorboard handles NAN/inf?

While the workaround is fine in most cases, my model showed values of around zero all the time and then started to diverge so replacing NANs with zeros in principle works but is not ideal in my situation.

rschiewer avatar Jun 12 '23 09:06 rschiewer

Hello @rschiewer ,

The product team is currently scoping this. This seems to involve relatively high engineering effort, so there is no ETA as of now, unfortunately :(

In your case, since the values hover around zero, can you replace NaNs with a high value so that they show up in charts, and you can then know when your model starts diverging?

SiddhantSadangi avatar Jun 15 '23 09:06 SiddhantSadangi

Hey everyone! Just a quick update here.

Neptune v1.8.3 now skips trying to log NaN and Inf values and throws a warning instead. This means you no longer have to check for nan/inf values in your code🥳

SiddhantSadangi avatar Dec 06 '23 08:12 SiddhantSadangi