issue-tracking icon indicating copy to clipboard operation
issue-tracking copied to clipboard

Hydra config objects can't be logged as hyperparameters

Open Vozf opened this issue 5 years ago • 14 comments

Currently if i have an object(instantiated by hydra later) for example

from dataclasses import dataclass, field
from typing import List, Tuple

import cv2
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING

cs = ConfigStore.instance()
group = "transform"

@dataclass
class HorizontalFlipConfig:
    _target_: str = "albumentations.HorizontalFlip"


cs.store(group=group, name="horizontal_flip", node=HorizontalFlipConfig)

It can't be viewed as a hyperparameter. It says ["DictConfig not JSON serializable"]. But omegaconf DictConfig is basically a dict. So this should be viewed as a hyperparameter

Vozf avatar Oct 05 '20 09:10 Vozf

@Vozf Thanks for the report!

What version of comet_ml are you using? We recently added some enhancements to handle exactly this.

If you are using the latest comet_ml, can you force it into an actual dict as a workaround?

dsblank avatar Oct 05 '20 15:10 dsblank

Updated to 3.2.3 and it still reproduces. There are workarounds of course but I would like to avoid that.

Vozf avatar Oct 06 '20 12:10 Vozf

Reproduced. Thanks for the feedback. Working on a solution for next release.

dsblank avatar Oct 06 '20 17:10 dsblank

Great, thanks. Also as a suggestion. It would be great if hyperparameters supported hierarchical nested dict-like structure without flattening. I've got quite big hydra nested dict config and it results in a very long list of flattened key-values(3 pages of key-values). So instead it would be a nested structure where I can open any dict key and see the nested dict.

Vozf avatar Oct 06 '20 17:10 Vozf

Thanks for the suggestion! Yes, we have thought about that. Currently, hyperparameters are key/values, except when a dict, and we represent that as a str(dict). However, that has limited space. For now, you can also log a dictionary as a JSON asset using experiment.log_data_asset() or even as a pickled dict.

On Tue, Oct 6, 2020 at 10:35 AM Alexander [email protected] wrote:

Great, thanks. Also as a suggestion. It would be great if hyperparameters supported hierarchical nested dict-like structure without flattening. I've got quite big hydra nested dict config and it results in a very long list of flattened key-values. So instead it would be a nested structure where I can open any dict key and see the nested dict.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comet-ml/issue-tracking/issues/383#issuecomment-704435247, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJE6FTSGPKLX7EPAX5CKTSJNIO3ANCNFSM4SEPB55Q .

dsblank avatar Oct 06 '20 17:10 dsblank

@Vozf - if you store the entire dict as a single hyper parameter it will be hard to

  • compare between different experiments
  • filter experiment by a specific nested objet inside that parameter

as a hyper parameter I would recommend to flatten them to single parameters (we might add a "group" of "prefix" so you can view / filter them together in the UI

if you just want to store the entire dict I would use the log_asset or log_data_asset as @dsblank mentioned above

Nimrod007 avatar Oct 08 '20 16:10 Nimrod007

@Vozf I new version of comet_ml will be released next week with enhancements for logging ConfigStore as parameters. Thanks again!

dsblank avatar Oct 09 '20 16:10 dsblank

Great news. Although are you sure we are on the same page that the problem is with logging DictConfig objects, not ConfigStore? ConfigStore is a hydra object that just stores all the configs, it not shouldn't be logged in comet in any case. The objects that are logged are dicts with config stuff wrapped in DictConfig.

Vozf avatar Oct 09 '20 16:10 Vozf

What were you trying to log in your example?

We made it work so that you can:

experiment.log(cs)

dsblank avatar Oct 09 '20 16:10 dsblank

Not really. As I've said cs just stores the configs and is not something to be logged. Take a look what hydra does in general. Here is a minimalistic tutorial https://hydra.cc/docs/next/tutorials/structured_config/schema/ Going from this tutorial what I'm trying to log is

    experiment.log_parameters({"params": [cfg]})

And a full example(Sorry did not test, but the idea is like this)

@dataclass
class DBConfig:
    driver: str = MISSING
    host: str = "localhost"
    port: int = MISSING


@dataclass
class MySQLConfig(DBConfig):
    driver: str = "mysql"
    port: int = 3306
    user: str = MISSING
    password: str = MISSING

@dataclass
class PostGreSQLConfig(DBConfig):
    driver: str = "postgresql"
    user: str = MISSING
    port: int = 5432
    password: str = MISSING
    timeout: int = 10

@dataclass
class Config:
    # Note the lack of defaults list here.
    # In this example it comes from config.yaml
    db: DBConfig = MISSING

cs = ConfigStore.instance()
cs.store(name="config", node=Config)
cs.store(group="db", name="mysql", node=MySQLConfig)
cs.store(group="db", name="postgresql", node=PostGreSQLConfig)

# The config name matches both 'config.yaml' under the conf directory
# and 'config' stored in the ConfigStore.
# config.yaml will compose in db: mysql by default (per the defaults list),
# and it will be validated against the schema from the Config class
@hydra.main(config_path="conf", config_name="config")
def my_app(cfg: Config) -> None:
    experiment.log_parameters({"params": [cfg]})
    print(OmegaConf.to_yaml(cfg))

This results in

["DictConfig not JSON serializable"]

It seems like the first example is not sufficient, sorry for misunderstanding. cs should not be logged, but DictConfig should DictConfig now is working but list of DictConfig - doesn't

Vozf avatar Oct 09 '20 17:10 Vozf

Thanks, that clarifies, and does expose a couple of other issues.

Here is an updated example logging your example, and the full ConfigStore:

from comet_ml import Experiment

from dataclasses import dataclass
from hydra.core.config_store import ConfigStore
from omegaconf import MISSING, OmegaConf
import hydra

experiment = Experiment(project_name="hydra")

cs = ConfigStore.instance()
group = "transform"

@dataclass
class HorizontalFlipConfig:
    _target_ = "albumentations.HorizontalFlip"

cs.store(
    group=group,
    name="horizontal_flip",
    node=HorizontalFlipConfig,
)

@hydra.main(config_path="configs", config_name="config")
def my_app(cfg):
    experiment.log_parameters({"params": [cfg]})
    experiment.log_parameters(cs)
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    my_app()

The output:

$ python3 hydra-example.py 
COMET INFO: old comet version (3.2.1) detected. current: 3.2.3 please update your comet lib with command: `pip install --no-cache-dir --upgrade comet_ml`
COMET INFO: Experiment is live on comet.ml https://www.comet.ml/dsblank/hydra/cc26de40d48e4a56a35db042105e460c

db:
  driver: mysql
  user: omry
  pass: secret

[2020-10-09 10:41:59,680][apscheduler.scheduler][INFO] - Scheduler has been shut down
COMET INFO: ---------------------------
COMET INFO: Comet.ml Experiment Summary
COMET INFO: ---------------------------
COMET INFO:   Data:
COMET INFO:     display_summary_level : 1
COMET INFO:     url                   : https://www.comet.ml/dsblank/hydra/cc26de40d48e4a56a35db042105e460c
COMET INFO:   Parameters:
COMET INFO:     hydra             : {'launcher': '{\'basic.yaml\': "{\'_target_\': \'hydra._internal.core_plugins.basic_launcher.BasicLauncher\'}"}', 'sweeper': '{\'basic.yaml\': "{\'_target_\': \'hydra._internal.core_plugins.basic_sweeper.BasicSweeper\', \'max_batch_size\': \'None\'}"}'}
COMET INFO:     hydra_config.yaml : {'defaults': "[{'hydra/hydra_logging': 'default'}, {'hydra/job_logging': 'default'}, {'hydra/launcher': 'basic'}, {'hydra/sweeper': 'basic'}, {'hydra/output': 'default'}, {'hydra/help': 'default'}, {'hydra/hydra_help': 'default'}]", 'hydra': "{'run': {'dir': '???'}, 'sweep': {'dir': '???', 'subdir': '???'}, 'hydra_logging': '???', 'job_logging': '???', 'sweeper': '???', 'launcher': '???', 'help': {'app_name': '???', 'header': '???', 'footer': '???', 'template': '???'}, 'hydra_help': {'hydra_help': '???', 'template': '???'}, 'output_subdir': '.hydra', 'overrides': {'hydra': [], 'task': []}, 'job': {'name': '???', 'override_dirname': '???', 'id': '???', 'num': '???', 'config_name': '???', 'env_set': {}, 'env_copy': [], 'config': {'override_dirname': {'kv_sep': '=', 'item_sep': ',', 'exclude_keys': []}}}, 'runtime': {'version': '???', 'cwd': '???'}, 'verbose': False}"}
COMET INFO:     params            : [{'db': {'driver': 'mysql', 'user': 'omry', 'pass': 'secret'}}]
COMET INFO:     transform         : {'horizontal_flip.yaml': '{}'}
COMET INFO:   Uploads:
COMET INFO:     asset               : 1
COMET INFO:     environment details : 1
COMET INFO:     filename            : 1
COMET INFO:     installed packages  : 1
COMET INFO:     os packages         : 1
COMET INFO: ---------------------------
COMET INFO: Uploading stats to Comet before program termination (may take several seconds)
COMET INFO: Still uploading

I see the params parameter is correct here. But see below.

And the related logged output logged in the Comet.ml UI:

https://www.comet.ml/dsblank/hydra/cc26de40d48e4a56a35db042105e460c?experiment-tab=params

So two issues that I see:

  1. hydra parameter has some spurious escapes
  2. list of non-JSON encodable items does not follow new code path

We'll address these two additional issues.

dsblank avatar Oct 09 '20 17:10 dsblank

Don't have access to the link, so didn't really get your points. but if it's for internal use - ok. Although, you can post a screenshot You've also forgotten to create root node like Config in tutorial and register it in cs without group with cs.store(name="config", node=Config)

Vozf avatar Oct 09 '20 18:10 Vozf

I fixed the link by making the project public.

I didn't forget... I don't have a hydra setup configured with a working database, so I did a minimal example.

dsblank avatar Oct 09 '20 18:10 dsblank

The tutorial is not connecting to any database. It just represents how hydra loads the config. You can run it without the database. Hydra is only reading configs and creates a parameter to main. It does not connect to anything. The root Config node is not linked to database. Root node should be present in any hydra config. I believe the strange logging artifacts you've encountered are connected to the absence of root node. At least I don't have such artifacts.

Vozf avatar Oct 09 '20 18:10 Vozf

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Nov 06 '23 21:11 github-actions[bot]