clearml
clearml copied to clipboard
Cannot use offline mode
Describe the bug
I cannot train with offline mode as it errors out with ValueError: Unsupported keyword arguments: force
. When not using offline mode, the training starts just fine.
Stacktrace
C:\Users\user\anaconda3\envs\conda_wrapper\python.exe C:\Users\user\Documents\GitHub\projects\model_update\train.py --multirun
[I 2024-02-27 15:46:29,606] Using an existing study with name 'debug' instead of creating a new one.
[2024-02-27 15:46:29,610][HYDRA] Study name: debug
[2024-02-27 15:46:29,610][HYDRA] Storage: sqlite:///C:/Users/user/Documents/Internship_AgroCares/experiments/debug.db
[2024-02-27 15:46:29,612][HYDRA] Sampler: TPESampler
[2024-02-27 15:46:29,612][HYDRA] Directions: ['minimize']
[2024-02-27 15:46:29,724][HYDRA] Launching 1 jobs locally
[2024-02-27 15:46:29,724][HYDRA] #0 : learning_rate=0.0009646166816485542 conv_dilation=4 conv_kernel_size=4 conv_filters_0=80 conv_filters_1=72 fc_neurons_0=512 fc_neurons_1=64 fc_neurons_2=512 fc_l2=3.3171161072480045e-05 batch_size=256 activation=relu pooling=avg pooling_size=2
ClearML Task: created new task id=offline-07e2611c2a684673926cf42cb3a03b51
Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
ng_size=2']
Error executing job with overrides: ['learning_rate=0.0009646166816485542', 'conv_dilation=4', 'conv_kernel_size=4', 'conv_filters_0=80', 'conv_filters_1=72', 'fc_neurons_0=512', 'fc_neurons_1=64', 'fc_neurons_2=512', 'fc_l2=3.3171161072480045e-05', 'batch_size=256', 'activation=relu', 'pooling=avg', 'pooli
ng_size=2']
Traceback (most recent call last):
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 213, in run_and_report
return func()
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\utils.py", line 461, in <lambda>
lambda: hydra.multirun(
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\_internal\hydra.py", line 162, in multirun
ret = sweeper.sweep(arguments=task_overrides)
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\optuna_sweeper.py", line 52, in sweep
return self.sweeper.sweep(arguments)
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 391, in sweep
raise e
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 360, in sweep
f"Return value must be float-castable. Got '{ret.return_value}'."
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
raise self._return_value
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra_plugins\hydra_optuna_sweeper\_impl.py", line 357, in sweep
values = [float(ret.return_value)]
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 260, in return_value
raise self._return_value
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\hydra\core\utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 230, in _patched_task_function
return task_function(a_config, *a_args, **a_kwargs)
File "C:\Users\user\Documents\GitHub\projects\model_update\train.py", line 66, in main
task = Task.init(
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\task.py", line 765, in init
PatchHydra.delete_overrides()
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\binding\hydra_bind.py", line 53, in delete_overrides
cls._current_task.delete_parameter(cls._overrides_section, force=True)
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_interface\task\task.py", line 1365, in delete_parameter
res = self.send(tasks.DeleteHyperParamsRequest(
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\services\v2_9\tasks.py", line 3814, in __init__
super(DeleteHyperParamsRequest, self).__init__(**kwargs)
File "C:\Users\user\anaconda3\envs\conda_wrapper\lib\site-packages\clearml\backend_api\session\request.py", line 31, in __init__
raise ValueError('Unsupported keyword arguments: %s' % ', '.join(kwargs.keys()))
ValueError: Unsupported keyword arguments: force
ClearML Task: Offline session stored in C:/Users/user/.clearml/cache/offline/offline-07e2611c2a684673926cf42cb3a03b51.zip
To reproduce
"""Demonstrate how training can be done in a simple fashion."""
from pathlib import Path
from clearml import Task
import hydra
from hydra.core.config_store import ConfigStore
import os
ConfigStore.instance().store(name="base_config", node=TrainConfiguration)
@hydra.main(version_base=None, config_path="conf", config_name="sweep")
def main(cfg: ScriptConfiguration):
"""Run."""
Task.set_offline(offline_mode=True)
task = Task.init(
project_name="Test",
task_name="debugtask",
tags=['debug']
)
trainer = Trainer(config=cfg)
train_loss = trainer.train()
task.close()
# Set offline to false and upload task to server
Task.set_offline(False)
if __name__ == "__main__":
main()
Expected behaviour
It should have trained normally, like when offline mode is not on.
Environment
- Server type (both self hosted and on app.clear.ml)
- ClearML SDK Version 1.14.3
- ClearML Server Version (Only for self hosted). Can be found on the bottom right corner of the settings screen.: 1.14.1-451
- Python Version 3.10.8
- OS (Windows \ Linux \ Macos)
Hi @michelkok ! Thank you for reporting. We have identified the problem and we will release a fix for this problem soon.
@eugen-ajechiloae-clearml while waiting for the release, would dropping the force
argument in the cls._current_task.delete_parameter
function in the PatchHydra
class from hydra_bind.py
fix the issue?
Hey @michelkok! Just letting you know that this issue has been resolved in v1.15.0. Let us know if there are any issues :)
Thanks, I will test the coming week! Will close if it is indeed resolved.
Solved indeed, thanks!