clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

No params passed to click

Open montmejat opened this issue 3 years ago • 4 comments

I'm getting the following error in my clearml-agent:

...
[.]$ /home/aurelien/Projects/Delair/AI/autotrain-env/bin/python -u -m autotrain.__main__ model train -p Drone-Detection/Aurelien-Dev
Summary - installed python packages:
...

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/aurelien/.clearml/venvs-builds/task_repository/auto-train.git/autotrain/__main__.py", line 91, in <module>
    cli(obj={})
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/click-8.1.3-py3.8.egg/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/click-8.1.3-py3.8.egg/click/core.py", line 1054, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/click-8.1.3-py3.8.egg/click/core.py", line 920, in make_context
    self.parse_args(ctx, args)
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/click-8.1.3-py3.8.egg/click/core.py", line 1613, in parse_args
    rest = super().parse_args(ctx, args)
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/clearml/binding/frameworks/__init__.py", line 36, in _inner_patch
    raise ex
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/clearml/binding/frameworks/__init__.py", line 34, in _inner_patch
    ret = patched_fn(original_fn, *args, **kwargs)
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/clearml/binding/click_bind.py", line 103, in _parse_args
    command = PatchClick._load_task_params()
  File "/home/aurelien/Projects/Delair/AI/autotrain-env/lib/python3.8/site-packages/clearml/binding/click_bind.py", line 156, in _load_task_params
    p.name for p in params['Args'].values()
KeyError: 'Args'

I tried to look into it and it seems that everything is empty:

    @staticmethod
    def _load_task_params():
        if not PatchClick.__remote_task_params: # PatchClick.__remote_task_params = None
            from clearml import Task
            t = Task.get_task(task_id=get_remote_task_id())
            # noinspection PyProtectedMember
            PatchClick.__remote_task_params = t._get_task_property('hyperparams') or {}
            params_dict = t.get_parameters(backwards_compatibility=False)
            skip = len(PatchClick._section_name)+1
            PatchClick.__remote_task_params_dict = {
                k[skip:]: v for k, v in params_dict.items()
                if k.startswith(PatchClick._section_name+'/')
            }

            # PatchClick.__remote_task_params = {}
            # PatchClick.__remote_task_params_dict = {}

        params = PatchClick.__remote_task_params # params = {}

        command = [
            p.name for p in params['Args'].values()
            if p.type == PatchClick._command_type and cast_str_to_bool(p.value, strip=True)]
        return command[0] if command else None

Any ideas as to what could go wrong?

montmejat avatar Jun 29 '22 10:06 montmejat

Here's a simple code that should reproduce the error (at least, it did for me on my machine):

import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torch.utils.data import random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl

from clearml import Task

import click


class LitAutoEncoder(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 64),
            nn.ReLU(),
            nn.Linear(64, 3))
        self.decoder = nn.Sequential(
            nn.Linear(3, 64),
            nn.ReLU(),
            nn.Linear(64, 28 * 28))

    def forward(self, x):
        embedding = self.encoder(x)
        return embedding

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

    def training_step(self, train_batch, batch_idx):
        x, y = train_batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('train_loss', loss)
        return loss

    def validation_step(self, val_batch, batch_idx):
        x, y = val_batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = F.mse_loss(x_hat, x)
        self.log('val_loss', loss)


@click.command()
def train():
    task = Task.init(project_name='examples', task_name='hello world')
    task.execute_remotely(queue_name='default')

    # data
    dataset = MNIST('', train=True, download=True, transform=transforms.ToTensor())
    mnist_train, mnist_val = random_split(dataset, [55000, 5000])

    train_loader = DataLoader(mnist_train, batch_size=32, num_workers=12)
    val_loader = DataLoader(mnist_val, batch_size=32, num_workers=12)

    # model
    model = LitAutoEncoder()

    # training
    trainer = pl.Trainer(gpus=1, precision=16)
    trainer.fit(model, train_loader, val_loader)


if __name__ == '__main__':
    train()

This is the simplest example you can do with click. Here's the error I get:

Traceback (most recent call last):
  File "example.py", line 74, in <module>
    train()
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/click/core.py", line 1054, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/click/core.py", line 920, in make_context
    self.parse_args(ctx, args)
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/frameworks/__init__.py", line 36, in _inner_patch
    raise ex
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/frameworks/__init__.py", line 34, in _inner_patch
    ret = patched_fn(original_fn, *args, **kwargs)
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/click_bind.py", line 114, in _parse_args
    PatchClick._load_task_params()
  File "/home/aurelien/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/binding/click_bind.py", line 152, in _load_task_params
    p.name for p in params['Args'].values()
KeyError: 'Args'

montmejat avatar Jul 04 '22 15:07 montmejat

By the way, I should be up to date with:

  • clearml_agent: 1.3.0
  • clearml: 1.6.2

montmejat avatar Jul 04 '22 16:07 montmejat

hi @aurelien-m thanks a lot for all those infos. I found the issue and we should release a fix soon. In the meantime :

  • you should not use click if you don't pass any arg to it.
  • you could pass a dummy argument to click, with a default value

We keep you updated regarding the fix :)

DavidNativ avatar Jul 05 '22 09:07 DavidNativ

Hey, thanks for coming back to me. Unfortunately our code base it pretty big and strongly relies on click. I'll make a work around for now with what you have given me, and I'll patiently wait for the next update :smile:

montmejat avatar Jul 05 '22 15:07 montmejat

Closing this as it was already fixed in released versions. Please reopen if required.

jkhenning avatar Mar 15 '23 15:03 jkhenning