[AIR/Tune] Session report does not show the key for those not included in the first metrics report
What happened + What you expected to happen
When using session.report, all of the metric keys should be provided in the first round of report, otherwise it would not show up in metrics_dataframe.
from ray import tune
from ray.air import session
from ray.tune import Tuner
def f(config):
for i in range(10):
if i % 2 == 0:
session.report({"a": 1})
else:
session.report({"b": 2})
result = Tuner(f).fit()
result = result.get_best_result().metrics_dataframe
print(result["a"])
print(result["b"])
The metrics_dataframe contains correct data for key a but key b is not found.
0 1.0
1 NaN
2 1.0
3 NaN
4 1.0
5 NaN
6 1.0
7 NaN
8 1.0
9 NaN
Name: a, dtype: float64
Traceback (most recent call last):
File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'b'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "test_report.py", line 18, in <module>
print(result["b"])
File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
raise KeyError(key) from err
KeyError: 'b'
Versions / Dependencies
Ray 3.0.0 Mac OS
Reproduction script
from ray import tune
from ray.air import session
from ray.tune import Tuner
def f(config):
for i in range(10):
if i % 2 == 0:
session.report({"a": 1})
else:
session.report({"b": 2})
result = Tuner(f).fit()
result = result.get_best_result().metrics_dataframe
print(result["a"])
print(result["b"])
Issue Severity
High: It blocks me from completing my task.
The problem here is that the result dataframes get loaded from the progress.csv file by default. The CSV logger doesn't support adding new keys (and will just ignore them), and it seems like we are not planning to support new keys for CSV logging (see #13766).
This works out of the box if you use the result.json instead to load the results, but there's no way to specify that you want to use the json currently.
Is it possible instead to load the metrics dataframe from json instead of csv? that would provide better automatic coverage.