clearml
clearml copied to clipboard
Pipeline component stuck on return
Thank you for helping us making ClearML better!
Describe the bug
I have a PipelineDecorator.Component
which freezes and never ends. It goes through all of the code, but can't close and move on to the next steps:
ClearML Task: created new task id=548c52661b3945f4b3cbe7bfa5618e21
ClearML results page: http://.../projects/481cff67eec4408b98a74ea3ff63bc21/experiments/548c52661b3945f4b3cbe7bfa5618e21/output/log
ClearML pipeline page: http://.../pipelines/481cff67eec4408b98a74ea3ff63bc21/experiments/548c52661b3945f4b3cbe7bfa5618e21
Launching step [download_files]
ClearML results page: http://.../projects/481cff67eec4408b98a74ea3ff63bc21/experiments/b7085c11626140dfb21f4ae5ee0d980c/output/log
INFO: Retrieving files locally...
INFO: Finished updating files.
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
To reproduce
My PipelineDecorator.component
looks like this:
@PipelineDecorator.component(
cache=False,
task_type='data_processing',
execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
import os
import shutil
from filecmp import cmp
from os.path import exists, join
from tqdm import tqdm
print('INFO: Retrieving files locally...')
datasets = datasets.strip('][').replace("'", '').split(', ')
for dataset in datasets:
for root, _, files in tqdm(os.walk(join(remote_root, dataset))):
for file in tqdm(files, leave=False):
directories = root.replace(remote_root, '').split('/')
if exists(join(local_root, *directories, file)) and cmp(join(root, file), join(local_root, *directories, file)):
continue
for i in range(len(directories)):
if not exists(join(local_root, *directories[:i + 1])):
os.mkdir(join(local_root, *directories[:i + 1]))
shutil.copyfile(join(root, file), join(local_root, *directories, file))
print('INFO: Finished updating files.')
return local_root
What I've noticed is that when I remove return local_root
, it actually closes the step and moves on. It's my return
that seems to be blocking my Component
to end.
Bonus issue :smile:: the lists are passed as an str
when entering the pipeline. I have to parse them into real lists manually with datasets = datasets.strip('][').replace("'", '').split(', ')
, would be nice if it would support lists.
Expected behaviour
It should finish and go to next step.
Environment
- Server type: self hosted
- ClearML SDK Version: 1.6.4
- ClearML Server Version: WebApp: 1.6.0-213 • Server: 1.6.0-213 • API: 2.20
- Python Version: 3.10.4
- OS: Linux
Update: some strange things are happening to my Component
...
This works:
-
return True
-
return 0
-
return 'hello'
This doesn't work (it makes the Component
unable to end):
-
return '/some/long/path/to/whatever'
I'm not sure what's happening here
I think it's just because it's logging all of the files that I'm processing. Is there a way to avoid save everything that's happening as artifacts? I have multiple GB of data and it's saving it all, which is long and seems like it's frozen I believe
From my tests:
- This will store every thing in a zip file at
/tmp/
and upload it as an artifact:
@PipelineDecorator.component(
cache=False,
task_type='data_processing',
execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
... doing a bunch of file stuff ...
return '/some/path/to/somewhere' # not sure what return causes this exactly
- This will save
True
as a 4 bytes file artifact (which saves sooo much time + doesn't use unnecessary disk space as I'm dealing with many many gigabytes in each of my pipelines):
@PipelineDecorator.component(
cache=False,
task_type='data_processing',
execution_queue="default",
)
def download_files(remote_root: str, local_root: str, datasets: List[str]):
... doing a bunch of file stuff ...
return True # or return 0
My question is: how do I avoid automatically logging artifacts?
Hi @aurelien-m,
Again, sorry for letting you debug this by yourself, but the best insights come like that so... ;)
Just making sure I get this 100%, when a pipeline step returns a string which is a path, the pipeline basically thinks that the content is an artifact and it stores it all?
As for the bonus issue, when having this signature:
def download_files(remote_root: str, local_root: str, datasets: List[str]):
Even though you give a list of strings (in the dataset argument), it's not really a list but a string that you manually have to parse. Did I get it right?
Yes sorry, it's a bit messy :sweat_smile: But you got it right.
My questions were:
- How to stop logging files/folders when returning paths?
For example:
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(cache=False)
def step_one():
return '/path/to/some/folder/'
@PipelineDecorator.pipeline(name='some_project', project='some_pipeline', version='0.0.5')
def executing_pipeline():
step_one()
if __name__ == '__main__':
PipelineDecorator.run_locally()
executing_pipeline()
This will log the folder as an artifact in a .zip
file:
My problem was that I was passing a path of a folder with hundreds of gigabytes, which was drastically slowing down the pipeline and using too much memory. What I've done is just pass a dictionary with my path: { 'my_path': '/path/to/some/folder/' }
. Not sure what we should do here, but I guess we can mark it as fixed.
- The lists are passed as strings
This actually got fixed! I'm using ClearML 1.7.2rc1
, so no worries there, but what it did is this:
from typing import List
from clearml.automation.controller import PipelineDecorator
@PipelineDecorator.component(cache=False)
def step_one(some_list: List):
print(some_list, '->', type(some_list)) # ['a', 'b', 'c'] -> <class 'str'>
@PipelineDecorator.pipeline(name='some_project', project='some_pipeline', version='0.0.5')
def executing_pipeline(some_list: List):
step_one(some_list)
if __name__ == '__main__':
PipelineDecorator.run_locally()
some_list = ['a', 'b', 'c']
executing_pipeline(some_list)
From my tests it actually shows # ['a', 'b', 'c'] -> <class 'list'>
now
@aurelien-m
Happy to hear this issue was solved :)
As for clearml zipping folders, AFAIK there's no workaround ATM, if I find something I'll update and otherwise I'll update once a fixed is released
Hi @aurelien-m,
Please install ClearML 1.7.2rc2. It should solve this issue :)
Closing this issue. Please reopen if it's still relevant.