clearml
clearml copied to clipboard
clearml logs 0 instead of nan
clearml logs 0 instead of NaN in bords.
expect that should be kept nan
https://clearml.slack.com/archives/CTK20V944/p1646928807931049
Hello! tell me please, is it intended that nan values are converted to 0 when logging? upd: I see NAN in the tensorboard, and 0 in Clearml. upd2: use v1.1.* (edited)
natanM [1 day ago]
Hi @pa antya, What are you logging? Can you provide a small snippet or a screenshot? (edited)
pa antya [24 hours ago]
u can run it (file in bottom) test_nan_clearml_vs_tb.ipynb
natanM [24 hours ago]
@pa antya, I will take a look soon
pa antya [24 hours ago]
test_nan - Iterations.json [{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_0","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"},{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_1","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"},{"task":"9f6d79d810cb481fad6e34f2e1e03563","name":"epoch_2","x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19],"y":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"type":"scatter"}]
pa antya [24 hours ago]
@natanM Will wait) not nice that this logging is misleading
natanM [23 hours ago]
@pa antya, can you point me to where in the script the reported scalars are? I think this might be happening because you can't report None for Logger.report_scalar() so the auto logging assigns it some sort of value - 0. What is your use case? If the value of the scalar is None then why log it?
pa antya [23 hours ago]
class LitMNIST(LightningModule): ... self.log('test/test_nan', np.nan, prog_bar=False, logger=True, on_step=True, on_epoch=False) ...
pa antya [23 hours ago]
all code
class LitMNIST(LightningModule): def __init__(self, data_dir=PATH_DATASETS, hidden_size=64, learning_rate=2e-4): super().__init__() Click to expand inline (94 lines)
pa antya [23 hours ago]
@natanM usability of the pytorch_lightning logger we log the average reward of each action for the RL agent. If the agent you did this action on the current episode, then his average reward will be nan , not 0. for obvious reasons. And we would like it to be visualized in the same way as in the tensorboard, for informational content.
pa antya [23 hours ago]
@natanM *If the agent did not perform a certain action, then its average reward per episode for this action will be nan , not 0
pa antya [23 hours ago]
import numpy as np np.nan
Martin.B [21 hours ago]
@pa antya upd: I see NAN in the tensorboard, and 0 in Clearml. I
have to admit, since NaN's are actually skipped in the graph, should we actually log them ?
pa antya [20 hours ago]
@Martin.B if I had to choose between logging or not logging, I would choose logging If you choose between logging as 0 or as nan, then I would choose as nan If you choose between skipping or logging like nan, then here I find it difficult, it seems that it is better to log than skip, but you need to think. to a greater extent, we are used to the tensorboard, where nan is logged in a special way, and this behavior seems to be natural. (edited)
Martin.B [19 hours ago]
If you choose between skipping or logging like nan, then here I find it difficult, it seems that it is better to log than skip, but you need to think. So I "think" the issue is plotly (UI), doesn't like NaN and also elastic (storing the scalar) is not a NaN fan. We need to check if they both agree on the representation, that it should be easy to fix... Maybe you could open a github issue, so we do not forget?
rename file for see
to test_nan - Iterations (1).json
its load from ClearML bord
Hi @paantya. Thanks for reporting this! As of now, clearml doesn't support logging 'NaN' (as both ELK and plotly (json) do not actually support it) values or any other strings and they default to a 0 value (as you mentioned). For now, we have decided to warn the user the first time such a value is encountered and allow the user to change the default value. This should be available in a future release.
@eugen-ajechiloae-clearml
what does it mean to change the default value? 0 to something else?
please tell me
@paantya I think the meaning is to allow the user to specify a value other than the default 0
to be used instead of NaN
when it is reported
@paantya I think the meaning is to allow the user to specify a value other than the default
0
to be used instead ofNaN
when it is reported
Yes, that's what it means
Hi @paantya, a new RC with a fix for this is ready pip install clearml==1.3.2rc1 you can see the new methods as such:
from clearml import Logger
Logger.set_reporting_nan_value()
Logger.set_reporting_inf_value()
Hi @paantya ,
Please note there has been a mistake, and the RC version is 1.3.2rc2
Hi @paantya, closing this. Please re-open if required.