clearml icon indicating copy to clipboard operation
clearml copied to clipboard

clearml logs 0 instead of nan

Open paantya opened this issue 2 years ago • 11 comments

clearml logs 0 instead of NaN in bords.

expect that should be kept nan

paantya avatar Mar 11 '22 16:03 paantya

https://clearml.slack.com/archives/CTK20V944/p1646928807931049

Hello! tell me please, is it intended that nan values are converted to 0 when logging? upd: I see NAN in the tensorboard, and 0 in Clearml. upd2: use v1.1.* (edited)

natanM [1 day ago]

Hi @pa antya, What are you logging? Can you provide a small snippet or a screenshot? (edited)

pa antya [24 hours ago]

u can run it (file in bottom) test_nan_clearml_vs_tb.ipynb

natanM [24 hours ago]

@pa antya, I will take a look soon

pa antya [24 hours ago]

test_nan -  Iterations.json 
[{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_0","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"},{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_1","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"},{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_2","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"}]

pa antya [24 hours ago]

@natanM Will wait) not nice that this logging is misleading

natanM [23 hours ago]

@pa antya, can you point me to where in the script the reported scalars are? I think this might be happening because you can't report None for Logger.report_scalar() so the auto logging assigns it some sort of value - 0. What is your use case? If the value of the scalar is None then why log it?

pa antya [23 hours ago]

class LitMNIST(LightningModule):
...
        self.log('test/test_nan', np.nan, prog_bar=False, logger=True, on_step=True, on_epoch=False)
...

pa antya [23 hours ago]

all code

class LitMNIST(LightningModule):
    def __init__(self, data_dir=PATH_DATASETS, hidden_size=64, learning_rate=2e-4):
​
        super().__init__()
Click to expand inline (94 lines)

pa antya [23 hours ago]

@natanM usability of the pytorch_lightning logger we log the average reward of each action for the RL agent. If the agent you did this action on the current episode, then his average reward will be nan , not 0. for obvious reasons. And we would like it to be visualized in the same way as in the tensorboard, for informational content.

pa antya [23 hours ago]

@natanM *If the agent did not perform a certain action, then its average reward per episode for this action will be nan , not 0

pa antya [23 hours ago]

import numpy as np
np.nan

Martin.B [21 hours ago]

@pa antya upd: I see NAN in the tensorboard, and 0 in Clearml. I

have to admit, since NaN's are actually skipped in the graph, should we actually log them ?

pa antya [20 hours ago]

@Martin.B if I had to choose between logging or not logging, I would choose logging If you choose between logging as 0 or as nan, then I would choose as nan If you choose between skipping or logging like nan, then here I find it difficult, it seems that it is better to log than skip, but you need to think. to a greater extent, we are used to the tensorboard, where nan is logged in a special way, and this behavior seems to be natural. (edited)

Martin.B [19 hours ago]

If you choose between skipping or logging like nan, then here I find it difficult, it seems that it is better to log than skip, but you need to think. So I "think" the issue is plotly (UI), doesn't like NaN and also elastic (storing the scalar) is not a NaN fan. We need to check if they both agree on the representation, that it should be easy to fix... Maybe you could open a github issue, so we do not forget?

paantya avatar Mar 11 '22 16:03 paantya

image (2) newplot (9) image (1)

paantya avatar Mar 11 '22 16:03 paantya

rename file for run to test_nan_clearml_vs_tb (1).ipynb

test_nan_clearml_vs_tb (1).ipynb.txt

paantya avatar Mar 11 '22 16:03 paantya

rename file for see to test_nan - Iterations (1).json

its load from ClearML bord

test_nan - Iterations (1).json.txt

paantya avatar Mar 11 '22 16:03 paantya

rename to Untitled.py

python class example for logging.

Untitled.py.txt

paantya avatar Mar 11 '22 16:03 paantya

Hi @paantya. Thanks for reporting this! As of now, clearml doesn't support logging 'NaN' (as both ELK and plotly (json) do not actually support it) values or any other strings and they default to a 0 value (as you mentioned). For now, we have decided to warn the user the first time such a value is encountered and allow the user to change the default value. This should be available in a future release.

@eugen-ajechiloae-clearml

what does it mean to change the default value? 0 to something else?

please tell me

paantya avatar Mar 18 '22 14:03 paantya

@paantya I think the meaning is to allow the user to specify a value other than the default 0 to be used instead of NaN when it is reported

jkhenning avatar Mar 22 '22 11:03 jkhenning

@paantya I think the meaning is to allow the user to specify a value other than the default 0 to be used instead of NaN when it is reported

Yes, that's what it means

Hi @paantya, a new RC with a fix for this is ready pip install clearml==1.3.2rc1 you can see the new methods as such:

from clearml import Logger

Logger.set_reporting_nan_value()
Logger.set_reporting_inf_value()

erezalg avatar Mar 24 '22 09:03 erezalg

Hi @paantya ,

Please note there has been a mistake, and the RC version is 1.3.2rc2

jkhenning avatar Mar 24 '22 16:03 jkhenning

Hi @paantya, closing this. Please re-open if required.

jkhenning avatar Sep 12 '22 06:09 jkhenning