clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Pipeline component stuck on return

Open montmejat opened this issue 2 years ago • 4 comments

Thank you for helping us making ClearML better!

Describe the bug

I have a PipelineDecorator.Component which freezes and never ends. It goes through all of the code, but can't close and move on to the next steps:

ClearML Task: created new task id=548c52661b3945f4b3cbe7bfa5618e21
ClearML results page: http://.../projects/481cff67eec4408b98a74ea3ff63bc21/experiments/548c52661b3945f4b3cbe7bfa5618e21/output/log
ClearML pipeline page: http://.../pipelines/481cff67eec4408b98a74ea3ff63bc21/experiments/548c52661b3945f4b3cbe7bfa5618e21
Launching step [download_files]
ClearML results page: http://.../projects/481cff67eec4408b98a74ea3ff63bc21/experiments/b7085c11626140dfb21f4ae5ee0d980c/output/log
INFO: Retrieving files locally...
INFO: Finished updating files.
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start

To reproduce

My PipelineDecorator.component looks like this:

@PipelineDecorator.component(
    cache=False,
    task_type='data_processing',
    execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
    import os
    import shutil
    from filecmp import cmp
    from os.path import exists, join
    from tqdm import tqdm

    print('INFO: Retrieving files locally...')

    datasets = datasets.strip('][').replace("'", '').split(', ')
    for dataset in datasets:
        for root, _, files in tqdm(os.walk(join(remote_root, dataset))):
            for file in tqdm(files, leave=False):
                directories = root.replace(remote_root, '').split('/')
                if exists(join(local_root, *directories, file)) and cmp(join(root, file), join(local_root, *directories, file)):
                    continue

                for i in range(len(directories)):
                    if not exists(join(local_root, *directories[:i + 1])):
                        os.mkdir(join(local_root, *directories[:i + 1]))

                shutil.copyfile(join(root, file), join(local_root, *directories, file))

    print('INFO: Finished updating files.')

    return local_root

What I've noticed is that when I remove return local_root, it actually closes the step and moves on. It's my return that seems to be blocking my Component to end.

Bonus issue :smile:: the lists are passed as an str when entering the pipeline. I have to parse them into real lists manually with datasets = datasets.strip('][').replace("'", '').split(', '), would be nice if it would support lists.

Expected behaviour

It should finish and go to next step.

Environment

  • Server type: self hosted
  • ClearML SDK Version: 1.6.4
  • ClearML Server Version: WebApp: 1.6.0-213 • Server: 1.6.0-213 • API: 2.20
  • Python Version: 3.10.4
  • OS: Linux

montmejat avatar Sep 15 '22 08:09 montmejat

Update: some strange things are happening to my Component...

This works:

  • return True
  • return 0
  • return 'hello'

This doesn't work (it makes the Component unable to end):

  • return '/some/long/path/to/whatever'

I'm not sure what's happening here

montmejat avatar Sep 15 '22 11:09 montmejat

I think it's just because it's logging all of the files that I'm processing. Is there a way to avoid save everything that's happening as artifacts? I have multiple GB of data and it's saving it all, which is long and seems like it's frozen I believe

montmejat avatar Sep 15 '22 11:09 montmejat

From my tests:

  • This will store every thing in a zip file at /tmp/ and upload it as an artifact:
@PipelineDecorator.component(
    cache=False,
    task_type='data_processing',
    execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
    
    ... doing a bunch of file stuff ...

    return '/some/path/to/somewhere' # not sure what return causes this exactly
  • This will save True as a 4 bytes file artifact (which saves sooo much time + doesn't use unnecessary disk space as I'm dealing with many many gigabytes in each of my pipelines):
@PipelineDecorator.component(
    cache=False,
    task_type='data_processing',
    execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
    
    ... doing a bunch of file stuff ...

    return True # or return 0

montmejat avatar Sep 15 '22 13:09 montmejat

My question is: how do I avoid automatically logging artifacts?

montmejat avatar Sep 16 '22 16:09 montmejat

Hi @aurelien-m,

Again, sorry for letting you debug this by yourself, but the best insights come like that so... ;)

Just making sure I get this 100%, when a pipeline step returns a string which is a path, the pipeline basically thinks that the content is an artifact and it stores it all?

As for the bonus issue, when having this signature: def download_files(remote_root: str, local_root: str, datasets: List[str]): Even though you give a list of strings (in the dataset argument), it's not really a list but a string that you manually have to parse. Did I get it right?

erezalg avatar Oct 12 '22 20:10 erezalg

Yes sorry, it's a bit messy :sweat_smile: But you got it right.

My questions were:

  1. How to stop logging files/folders when returning paths?

For example:

from clearml.automation.controller import PipelineDecorator


@PipelineDecorator.component(cache=False)
def step_one():
    return '/path/to/some/folder/'


@PipelineDecorator.pipeline(name='some_project', project='some_pipeline', version='0.0.5')
def executing_pipeline():
    step_one()


if __name__ == '__main__':
    PipelineDecorator.run_locally()
    executing_pipeline()

This will log the folder as an artifact in a .zip file:

image

My problem was that I was passing a path of a folder with hundreds of gigabytes, which was drastically slowing down the pipeline and using too much memory. What I've done is just pass a dictionary with my path: { 'my_path': '/path/to/some/folder/' }. Not sure what we should do here, but I guess we can mark it as fixed.

  1. The lists are passed as strings

This actually got fixed! I'm using ClearML 1.7.2rc1, so no worries there, but what it did is this:

from typing import List
from clearml.automation.controller import PipelineDecorator


@PipelineDecorator.component(cache=False)
def step_one(some_list: List):
    print(some_list, '->', type(some_list)) # ['a', 'b', 'c'] -> <class 'str'>


@PipelineDecorator.pipeline(name='some_project', project='some_pipeline', version='0.0.5')
def executing_pipeline(some_list: List):
    step_one(some_list)


if __name__ == '__main__':
    PipelineDecorator.run_locally()

    some_list = ['a', 'b', 'c']
    executing_pipeline(some_list)

From my tests it actually shows # ['a', 'b', 'c'] -> <class 'list'> now

montmejat avatar Oct 13 '22 09:10 montmejat

@aurelien-m

Happy to hear this issue was solved :)

As for clearml zipping folders, AFAIK there's no workaround ATM, if I find something I'll update and otherwise I'll update once a fixed is released

erezalg avatar Oct 13 '22 12:10 erezalg

Hi @aurelien-m,

Please install ClearML 1.7.2rc2. It should solve this issue :)

erezalg avatar Oct 16 '22 08:10 erezalg

Closing this issue. Please reopen if it's still relevant.

jkhenning avatar Mar 15 '23 13:03 jkhenning