ray [AIR/Tune] Session report does not show the key for those not included in the first metrics report

What happened + What you expected to happen

When using session.report, all of the metric keys should be provided in the first round of report, otherwise it would not show up in metrics_dataframe.

from ray import tune
from ray.air import session
from ray.tune import Tuner


def f(config):
	for i in range(10):
		if i % 2 == 0:
			session.report({"a": 1})
		else:
			session.report({"b": 2})


result = Tuner(f).fit()
result = result.get_best_result().metrics_dataframe

print(result["a"])
print(result["b"])

The metrics_dataframe contains correct data for key a but key b is not found.

0    1.0
1    NaN
2    1.0
3    NaN
4    1.0
5    NaN
6    1.0
7    NaN
8    1.0
9    NaN
Name: a, dtype: float64
Traceback (most recent call last):
  File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3629, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'b'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "test_report.py", line 18, in <module>
    print(result["b"])
  File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/ilee300/miniforge3/envs/mosaic_ray/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3631, in get_loc
    raise KeyError(key) from err
KeyError: 'b'

Versions / Dependencies

Ray 3.0.0 Mac OS

Reproduction script

from ray import tune
from ray.air import session
from ray.tune import Tuner


def f(config):
	for i in range(10):
		if i % 2 == 0:
			session.report({"a": 1})
		else:
			session.report({"b": 2})


result = Tuner(f).fit()
result = result.get_best_result().metrics_dataframe

print(result["a"])
print(result["b"])

Issue Severity

High: It blocks me from completing my task.

Sep 15 '22 21:09 ilee300a

The problem here is that the result dataframes get loaded from the progress.csv file by default. The CSV logger doesn't support adding new keys (and will just ignore them), and it seems like we are not planning to support new keys for CSV logging (see #13766).

This works out of the box if you use the result.json instead to load the results, but there's no way to specify that you want to use the json currently.

Sep 21 '22 17:09 justinvyu

Is it possible instead to load the metrics dataframe from json instead of csv? that would provide better automatic coverage.

Oct 11 '22 22:10 richardliaw