clearml
clearml copied to clipboard
Feature add jsonargparse support
When using the LightningCLI
feature that uses jsonargparse
the parameters are not captured automatically. I have attached a file with the code to reproduce (discussed with @bmartinn).
lightning_cli.zip
Thank you @ramonemiliani93 !
Basically what's seems to be missing is support for jsonargparse
(which is only used if already installed on the system)
Let's change the subject to "Feature add jsonargparse
support" , wdyt?
Hi, I would like to try and take this up :)
Have taken a look at the code already and had a question:
Since jsonargparse
is so similar to argparse
, should I try and add support for it to the existing argparse
bind or create a new, specific bind for it just like what was done for click? The click integration looks a lot cleaner and less complex though, so it might be easier that way. Also, if I figure this one out, the same idea might apply to implement issue #500.
Also the code in argparse
is barely commented, so someone to guide the way just a little would be much appreciated! :)
Hey @thepycoder!
Hi, I would like to try and take this up :)
🙏 👍
Since jsonargparse is so similar to argparse, should I try and add support for it to the existing argparse bind or create a new, specific bind for it just like what was done for click?Since jsonargparse is so similar to argparse, should I try and add support for it to the existing argparse bind or create a new, specific bind for it just like what was done for click?
Well, it depends - if you think you can get a clean implementation that will be much simpler, we won't object 🙂 - but I think the reason the argparse binding looks much more complex is because it is - there are much more things to handle and edge-cases (and bugs) that were addressed - personally I would start by trying to figure out the similarities and differences, to get a better understanding of what can be reused and/or extended.
Also the code in argparse is barely commented, so someone to guide the way just a little would be much appreciated! :)
Sure 🙂 - we'll be super happy to help, explain and guide. You're also welcome to ping us in the Slack clearml-dev channel (for quicker answers to simple questions - I still thing larger discussions are better off done here, for reference-sake)
Hello. I developed both jsonargparse and LightningCLI and randomly ended up reading this issue. I am not familiar with clearml so don't really understand what "add jsonargparse support" means here. What exactly is a binding supposed to do? Looking at commit https://github.com/allegroai/clearml/commit/4be4ba1a9a40670ee74e137d79adc5c171613a33 I see duplication of code and assumptions on the internals of jsonargparse. This is bound to break as jsonargparse evolves over time. I think it would be preferable if this integration makes use of only the official API. Certainly I would expect that whatever you need here is not possible with the current API. However, if you explain what is actually needed, a proper support for it could be added to jsonargparse.
Hi @mauvilsa, Basically ClearML logs everything a user uses (code \ params \ outputs) automatically, and then allow it to override configuration when rerunning it. What we do in the context of jsonargparse and LightningCLI is monkey-patch them and automatically (and transparently, at least for the user using jsonargparse \ LightningCLI) log them. Later on we can also can get parameters from users and then inject them back to the code. Binding is basically the method in-which connect to these frameworks. We do this for a few other CLI tools (argparse \ click \ python fire) as well as other frameworks (pytorch \ scikitlearn \ tensorflow).
Obviously we're working with what we've got and most of these tools were not designed to be "hijacked", we'd be more than happy to open a communication channel in which you can expose APIs that we'll use.
@erezalg thanks for the response. Though, I think I will need more details in order to know what would be needed from jsonargparse. In particular, if the only requirements are to log parameters and be able to override them, I don't understand why there is a need for such complexity like in https://github.com/allegroai/clearml/commit/4be4ba1a9a40670ee74e137d79adc5c171613a33. Both of these could be easily done by just patching the parse_args
method and not meddle around with internals. What I mean is something like:
def parse_args(self, *args, **kwargs) -> Namespace:
params = self._original_parse_args(*args, **kwargs)
log_parameters_if_needed(params)
override_parameters_if_needed(params)
return params
One problem this could have is that changed parameters might no longer be valid and it wouldn't be noticed because validation has already been done. A small improvement would fix that, like:
def parse_args(self, *args, **kwargs) -> Namespace:
params = self._original_parse_args(*args, **kwargs)
log_parameters_if_needed(params)
if need_to_override_parameters:
override_parameters(params)
self.check_config(params)
return params
Something that might help me understand is unit tests. I see that pytest is a dev requirement, though I am not sure where the unit tests are. Would be great to see a unit test for the jsonargparse binding.
Hi @mauvilsa. Thank you for your help! We have come up with a solution that no longer uses internal jsonargparse
functions, similar to what you described. The only part that adds a bit of complexity is flattening and unflattening Namespaces and converting them to dicts and vice-versa, as the parameters are logged by ClearML as flat dicts.
Would it be OK if we tag you here to take a look at the solution once we have a RC (should be pretty soon)?
@eugen-ajechiloae-clearml certainly. Feel free to tag me if you want me to review.
Hi @mauvilsa. We have updated our jsonargparse support in this commit. If you could take a look that would be great!
@eugen-ajechiloae-clearml I have added some comments to the commit https://github.com/allegroai/clearml/commit/c5e428a03df46a263071c1236b596dd8c47f9bc3.
@mauvilsa Thank you very much for the review. I believe all your points are valid. We will cleanup the code further ;)
Hey everyone, I was wondering if this has been fixed. If I try to use lightning CLI on a task that is executed remotely the args are not captured correctly:
I took the minimal example from the clearml repo and extended it like this:
task = Task.init(project_name="example", task_name="pytorch_lightning_jsonargparse")
+ task.execute_remotely(queue_name="single-gpu") # that's our internal queue
LightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)
When running this experiment (python pytorch_lightning_cli.py fit
) I get the following error:
Traceback (most recent call last):
File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 120, in <module>
LightningCLI(ImageClassifier,
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 157, in __init__
super().__init__(*args, **kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 346, in __init__
self.subcommand = self.config["subcommand"] if run else None
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/namespace.py", line 200, in __getitem__
leaf_key, parent_ns, _ = self._parse_required_key(key)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/namespace.py", line 161, in _parse_required_key
raise KeyError(f'Key "{key}" not found in namespace.')
KeyError: 'Key "subcommand" not found in namespace.'
Running locally is no problem at all.
To me it seems like the command line args are not passed when running the script remotely, which makes sense because the args are only parsed after Task.execute_remotely()
is called. Calling it after instantiating LightningCLI
makes no sense though because it would block until after the execution has run locally. Using the CLI with run=False
seems like a bad choice since it would mean one needs to implement all stages by hand.
The best solution I could come up with is defining my own minimal CLI like this:
# code as before in pytorch_lightning.py
class MyLightningCLI(LightningCLI):
def __init__(self, *args, **kwargs):
Task.add_requirements('requirements.txt')
self.task = Task.init(project_name="example",
task_name="pytorch_lightning_jsonargparse")
super().__init__(*args, **kwargs)
def before_instantiate_classes(self):
self.task.execute_remotely(queue_name='single-gpu')
if __name__ == "__main__":
MyLightningCLI(ImageClassifier,
seed_everything_default=42,
save_config_overwrite=True,
run=True)
This works if I run it with python pytorch_lightning_cli.py fit
but fails once I try to load the config with python pytorch_lightning_cli.py fit -c pytorch_lightning_cli.yaml
, i.e. once I add more complex arguments like callbacks. The error message looks like this:
Traceback (most recent call last):
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 572, in adapt_typehints
vals.append(adapt_typehints(val, subtypehint, **adapt_kwargs))
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 614, in adapt_typehints
raise ValueError(f'Expected a List but got "{val}"')
ValueError: Expected a List but got "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 128, in <module>
MyLightningCLI(ImageClassifier,
File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 120, in __init__
super().__init__(*args, **kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 157, in __init__
super().__init__(*args, **kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 351, in __init__
self.instantiate_classes()
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 525, in instantiate_classes
self.config_init = self.parser.instantiate_classes(self.config)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/core.py", line 1129, in instantiate_classes
cfg[subcommand] = subparser.instantiate_classes(cfg[subcommand], instantiate_groups=instantiate_groups)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/core.py", line 1120, in instantiate_classes
parent[key] = component.instantiate_classes(value)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 433, in instantiate_classes
value[num] = adapt_typehints(val, self._typehint, instantiate_classes=True, sub_add_kwargs=sub_add_kwargs)
File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 578, in adapt_typehints
raise ValueError(f'Value "{val}" does not validate against any of the types in {typehint}:{e}') from vals[0]
ValueError: Value "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]" does not validate against any of the types in typing.Union[typing.List[pytorch_lightning.callbacks.callback.Callback], pytorch_lightning.callbacks.callback.Callback, NoneType]:
- Expected a List but got "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]"
- Unexpected import path format: [Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]
- Expected a <class 'NoneType'> but got "["Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor'", "init_args=Namespace(logging_interval='epoch'", 'log_momentum=False))', "Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint'", 'init_args=Namespace(dirpath=None', "filename='best'", "monitor='loss'", 'verbose=False', 'save_last=False', 'save_top_k=1', 'save_weights_only=False', "mode='min'", 'auto_insert_metric_name=True', 'every_n_train_steps=None', 'train_time_interval=None', 'every_n_epochs=None', 'save_on_train_epoch_end=None))']"
The output of pip list
run locally is pip_list.txt.
It looks like I found a hacky solution, maybe it helps fix this problem.
Shown below is how I modify the minimal pytorch lightning example. Now everything can be run locally and remotely without problems.
diff --git a/examples/frameworks/jsonargparse/pytorch_lightning_cli.py b/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
index d73428e..5ce44d5 100644
--- a/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
+++ b/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
@@ -26,6 +26,8 @@ from pytorch_lightning import LightningModule
from pytorch_lightning.utilities.cli import LightningCLI
from clearml import Task
+from jsonargparse import Namespace
+
class Net(nn.Module):
def __init__(self):
@@ -97,7 +99,35 @@ class ImageClassifier(LightningModule):
return torch.utils.data.DataLoader(test_dataset, batch_size=self.hparams.batch_size)
+class MyLightningCLI(LightningCLI):
+ def __init__(self, *args, **kwargs):
+ Task.add_requirements('requirements.txt')
+ self.task = Task.init(project_name="example",
+ task_name="pytorch_lightning_jsonargparse",
+ )
+ super().__init__(*args, **kwargs)
+
+ def before_instantiate_classes(self):
+ self.task.execute_remotely(queue_name='single-gpu')
+
+ self.config = _recursive_remove_namespaces(self.config)
+ print(self.config)
+
+
+def _recursive_remove_namespaces(value):
+ """Turns a config object into something LightningCLI can work with."""
+ if isinstance(value, list):
+ for idx, element in enumerate(value):
+ value[idx] = _recursive_remove_namespaces(element)
+ elif isinstance(value, dict):
+ for key, element in value.items():
+ value[key] = _recursive_remove_namespaces(element)
+ elif isinstance(value, str) and 'Namespace' in value:
+ value = _recursive_remove_namespaces(eval(value))
+ elif isinstance(value, Namespace):
+ value = _recursive_remove_namespaces(value.as_dict())
+
+ return value
+
if __name__ == "__main__":
- Task.add_requirements('requirements.txt')
- task = Task.init(project_name="example", task_name="pytorch_lightning_jsonargparse")
- LightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)
+ MyLightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)
This obviously has issues since I am using eval
on a user input.
Hi @jhaux-askui @mauvilsa,
With the latest clearml v1.9.0, we believed we've solved the issue.
Could you please give it a try and let us know whether it works or not?
Closing this as it was already released. Please reopen if required.