clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Feature add jsonargparse support

Open ramonemiliani93 opened this issue 3 years ago • 11 comments

When using the LightningCLI feature that uses jsonargparse the parameters are not captured automatically. I have attached a file with the code to reproduce (discussed with @bmartinn). lightning_cli.zip

ramonemiliani93 avatar Jul 15 '21 16:07 ramonemiliani93

Thank you @ramonemiliani93 ! Basically what's seems to be missing is support for jsonargparse (which is only used if already installed on the system) Let's change the subject to "Feature add jsonargparse support" , wdyt?

bmartinn avatar Jul 15 '21 22:07 bmartinn

Hi, I would like to try and take this up :) Have taken a look at the code already and had a question: Since jsonargparse is so similar to argparse, should I try and add support for it to the existing argparse bind or create a new, specific bind for it just like what was done for click? The click integration looks a lot cleaner and less complex though, so it might be easier that way. Also, if I figure this one out, the same idea might apply to implement issue #500.

Also the code in argparse is barely commented, so someone to guide the way just a little would be much appreciated! :)

thepycoder avatar Dec 08 '21 08:12 thepycoder

Hey @thepycoder!

Hi, I would like to try and take this up :)

🙏 👍

Since jsonargparse is so similar to argparse, should I try and add support for it to the existing argparse bind or create a new, specific bind for it just like what was done for click?Since jsonargparse is so similar to argparse, should I try and add support for it to the existing argparse bind or create a new, specific bind for it just like what was done for click?

Well, it depends - if you think you can get a clean implementation that will be much simpler, we won't object 🙂 - but I think the reason the argparse binding looks much more complex is because it is - there are much more things to handle and edge-cases (and bugs) that were addressed - personally I would start by trying to figure out the similarities and differences, to get a better understanding of what can be reused and/or extended.

Also the code in argparse is barely commented, so someone to guide the way just a little would be much appreciated! :)

Sure 🙂 - we'll be super happy to help, explain and guide. You're also welcome to ping us in the Slack clearml-dev channel (for quicker answers to simple questions - I still thing larger discussions are better off done here, for reference-sake)

jkhenning avatar Dec 08 '21 08:12 jkhenning

Hello. I developed both jsonargparse and LightningCLI and randomly ended up reading this issue. I am not familiar with clearml so don't really understand what "add jsonargparse support" means here. What exactly is a binding supposed to do? Looking at commit https://github.com/allegroai/clearml/commit/4be4ba1a9a40670ee74e137d79adc5c171613a33 I see duplication of code and assumptions on the internals of jsonargparse. This is bound to break as jsonargparse evolves over time. I think it would be preferable if this integration makes use of only the official API. Certainly I would expect that whatever you need here is not possible with the current API. However, if you explain what is actually needed, a proper support for it could be added to jsonargparse.

mauvilsa avatar Mar 18 '22 13:03 mauvilsa

Hi @mauvilsa, Basically ClearML logs everything a user uses (code \ params \ outputs) automatically, and then allow it to override configuration when rerunning it. What we do in the context of jsonargparse and LightningCLI is monkey-patch them and automatically (and transparently, at least for the user using jsonargparse \ LightningCLI) log them. Later on we can also can get parameters from users and then inject them back to the code. Binding is basically the method in-which connect to these frameworks. We do this for a few other CLI tools (argparse \ click \ python fire) as well as other frameworks (pytorch \ scikitlearn \ tensorflow).

Obviously we're working with what we've got and most of these tools were not designed to be "hijacked", we'd be more than happy to open a communication channel in which you can expose APIs that we'll use.

erezalg avatar Mar 20 '22 13:03 erezalg

@erezalg thanks for the response. Though, I think I will need more details in order to know what would be needed from jsonargparse. In particular, if the only requirements are to log parameters and be able to override them, I don't understand why there is a need for such complexity like in https://github.com/allegroai/clearml/commit/4be4ba1a9a40670ee74e137d79adc5c171613a33. Both of these could be easily done by just patching the parse_args method and not meddle around with internals. What I mean is something like:

def parse_args(self, *args, **kwargs) -> Namespace:
    params = self._original_parse_args(*args, **kwargs)
    log_parameters_if_needed(params)
    override_parameters_if_needed(params)
    return params

One problem this could have is that changed parameters might no longer be valid and it wouldn't be noticed because validation has already been done. A small improvement would fix that, like:

def parse_args(self, *args, **kwargs) -> Namespace:
    params = self._original_parse_args(*args, **kwargs)
    log_parameters_if_needed(params)
    if need_to_override_parameters:
        override_parameters(params)
        self.check_config(params)
    return params

Something that might help me understand is unit tests. I see that pytest is a dev requirement, though I am not sure where the unit tests are. Would be great to see a unit test for the jsonargparse binding.

mauvilsa avatar Mar 21 '22 08:03 mauvilsa

Hi @mauvilsa. Thank you for your help! We have come up with a solution that no longer uses internal jsonargparse functions, similar to what you described. The only part that adds a bit of complexity is flattening and unflattening Namespaces and converting them to dicts and vice-versa, as the parameters are logged by ClearML as flat dicts. Would it be OK if we tag you here to take a look at the solution once we have a RC (should be pretty soon)?

@eugen-ajechiloae-clearml certainly. Feel free to tag me if you want me to review.

mauvilsa avatar Mar 22 '22 17:03 mauvilsa

Hi @mauvilsa. We have updated our jsonargparse support in this commit. If you could take a look that would be great!

@eugen-ajechiloae-clearml I have added some comments to the commit https://github.com/allegroai/clearml/commit/c5e428a03df46a263071c1236b596dd8c47f9bc3.

mauvilsa avatar Mar 26 '22 08:03 mauvilsa

@mauvilsa Thank you very much for the review. I believe all your points are valid. We will cleanup the code further ;)

Hey everyone, I was wondering if this has been fixed. If I try to use lightning CLI on a task that is executed remotely the args are not captured correctly:

I took the minimal example from the clearml repo and extended it like this:

     task = Task.init(project_name="example", task_name="pytorch_lightning_jsonargparse")
+    task.execute_remotely(queue_name="single-gpu")  # that's our internal queue
     LightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)

When running this experiment (python pytorch_lightning_cli.py fit) I get the following error: Traceback (most recent call last):

  File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 120, in <module>
    LightningCLI(ImageClassifier,
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 157, in __init__
    super().__init__(*args, **kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 346, in __init__
    self.subcommand = self.config["subcommand"] if run else None
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/namespace.py", line 200, in __getitem__
    leaf_key, parent_ns, _ = self._parse_required_key(key)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/namespace.py", line 161, in _parse_required_key
    raise KeyError(f'Key "{key}" not found in namespace.')
KeyError: 'Key "subcommand" not found in namespace.'

Running locally is no problem at all.

To me it seems like the command line args are not passed when running the script remotely, which makes sense because the args are only parsed after Task.execute_remotely() is called. Calling it after instantiating LightningCLI makes no sense though because it would block until after the execution has run locally. Using the CLI with run=False seems like a bad choice since it would mean one needs to implement all stages by hand.

The best solution I could come up with is defining my own minimal CLI like this:

# code as before in pytorch_lightning.py

class MyLightningCLI(LightningCLI):
    def __init__(self, *args, **kwargs):
        Task.add_requirements('requirements.txt')
        self.task = Task.init(project_name="example",
                         task_name="pytorch_lightning_jsonargparse")
        super().__init__(*args, **kwargs)

    def before_instantiate_classes(self):
        self.task.execute_remotely(queue_name='single-gpu')


if __name__ == "__main__":
    
    MyLightningCLI(ImageClassifier,
                   seed_everything_default=42,
                   save_config_overwrite=True,
                   run=True)

This works if I run it with python pytorch_lightning_cli.py fit but fails once I try to load the config with python pytorch_lightning_cli.py fit -c pytorch_lightning_cli.yaml, i.e. once I add more complex arguments like callbacks. The error message looks like this:

Traceback (most recent call last):
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 572, in adapt_typehints
    vals.append(adapt_typehints(val, subtypehint, **adapt_kwargs))
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 614, in adapt_typehints
    raise ValueError(f'Expected a List but got "{val}"')
ValueError: Expected a List but got "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 128, in <module>
    MyLightningCLI(ImageClassifier,
  File "/home/clearml/.clearml/venvs-builds.1/3.9/task_repository/clearml.git/examples/frameworks/jsonargparse/pytorch_lightning_cli.py", line 120, in __init__
    super().__init__(*args, **kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 157, in __init__
    super().__init__(*args, **kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 351, in __init__
    self.instantiate_classes()
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/pytorch_lightning/cli.py", line 525, in instantiate_classes
    self.config_init = self.parser.instantiate_classes(self.config)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/core.py", line 1129, in instantiate_classes
    cfg[subcommand] = subparser.instantiate_classes(cfg[subcommand], instantiate_groups=instantiate_groups)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/core.py", line 1120, in instantiate_classes
    parent[key] = component.instantiate_classes(value)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 433, in instantiate_classes
    value[num] = adapt_typehints(val, self._typehint, instantiate_classes=True, sub_add_kwargs=sub_add_kwargs)
  File "/home/clearml/.clearml/venvs-builds.1/3.9/lib/python3.9/site-packages/jsonargparse/typehints.py", line 578, in adapt_typehints
    raise ValueError(f'Value "{val}" does not validate against any of the types in {typehint}:{e}') from vals[0]
ValueError: Value "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]" does not validate against any of the types in typing.Union[typing.List[pytorch_lightning.callbacks.callback.Callback], pytorch_lightning.callbacks.callback.Callback, NoneType]:
  - Expected a List but got "[Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]"
  - Unexpected import path format: [Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor', init_args=Namespace(logging_interval='epoch', log_momentum=False)), Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint', init_args=Namespace(dirpath=None, filename='best', monitor='loss', verbose=False, save_last=False, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None))]
  - Expected a <class 'NoneType'> but got "["Namespace(class_path='pytorch_lightning.callbacks.LearningRateMonitor'", "init_args=Namespace(logging_interval='epoch'", 'log_momentum=False))', "Namespace(class_path='pytorch_lightning.callbacks.ModelCheckpoint'", 'init_args=Namespace(dirpath=None', "filename='best'", "monitor='loss'", 'verbose=False', 'save_last=False', 'save_top_k=1', 'save_weights_only=False', "mode='min'", 'auto_insert_metric_name=True', 'every_n_train_steps=None', 'train_time_interval=None', 'every_n_epochs=None', 'save_on_train_epoch_end=None))']"

The output of pip list run locally is pip_list.txt.

jhaux-askui avatar Nov 16 '22 09:11 jhaux-askui

It looks like I found a hacky solution, maybe it helps fix this problem.

Shown below is how I modify the minimal pytorch lightning example. Now everything can be run locally and remotely without problems.

diff --git a/examples/frameworks/jsonargparse/pytorch_lightning_cli.py b/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
index d73428e..5ce44d5 100644
--- a/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
+++ b/examples/frameworks/jsonargparse/pytorch_lightning_cli.py
@@ -26,6 +26,8 @@ from pytorch_lightning import LightningModule
 from pytorch_lightning.utilities.cli import LightningCLI
 from clearml import Task
 
+from jsonargparse import Namespace
+
 
 class Net(nn.Module):
     def __init__(self):
@@ -97,7 +99,35 @@ class ImageClassifier(LightningModule):
         return torch.utils.data.DataLoader(test_dataset, batch_size=self.hparams.batch_size)
 
 
+class MyLightningCLI(LightningCLI):
+    def __init__(self, *args, **kwargs):
+        Task.add_requirements('requirements.txt')
+        self.task = Task.init(project_name="example",
+                         task_name="pytorch_lightning_jsonargparse",
+                         )
+        super().__init__(*args, **kwargs)
+
+    def before_instantiate_classes(self):
+        self.task.execute_remotely(queue_name='single-gpu')
+
+        self.config = _recursive_remove_namespaces(self.config)
+        print(self.config)
+
+
+def _recursive_remove_namespaces(value):
+    """Turns a config object into something LightningCLI can work with."""
+    if isinstance(value, list):
+        for idx, element in enumerate(value):
+            value[idx] = _recursive_remove_namespaces(element)
+    elif isinstance(value, dict):
+        for key, element in value.items():
+            value[key] = _recursive_remove_namespaces(element)
+    elif isinstance(value, str) and 'Namespace' in value:
+        value = _recursive_remove_namespaces(eval(value))
+    elif isinstance(value, Namespace):
+        value = _recursive_remove_namespaces(value.as_dict())
+
+    return value
+
 if __name__ == "__main__":
-    Task.add_requirements('requirements.txt')
-    task = Task.init(project_name="example", task_name="pytorch_lightning_jsonargparse")
-    LightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)
+    MyLightningCLI(ImageClassifier, seed_everything_default=42, save_config_overwrite=True, run=True)

This obviously has issues since I am using eval on a user input.

jhaux-askui avatar Nov 16 '22 17:11 jhaux-askui

Hi @jhaux-askui @mauvilsa,

With the latest clearml v1.9.0, we believed we've solved the issue.

Could you please give it a try and let us know whether it works or not?

erezalg avatar Dec 29 '22 12:12 erezalg

Closing this as it was already released. Please reopen if required.

jkhenning avatar Mar 15 '23 13:03 jkhenning