clearml icon indicating copy to clipboard operation
clearml copied to clipboard

Cannot use add_function_step for pipeline without optional parameter function_return

Open lpogo opened this issue 2 years ago • 7 comments

Hi, based on documentation for add_function_step:

function_return (Optional [ List [ str ] ] ) – Provide a list of names for all the results. If not provided no results will be stored as artifacts.

I wanted to skip this param and don't store any artifacts in my task. Unfortunately when I skipped this param in step definition, pipeline failed.

pipe = PipelineController(
        name='UsingFunction',
        project='Test',
        version='0.0.1'
    )

pipe.add_function_step(
        name='preprocessing',
        function=preprocessing
    )

Error:

Traceback (most recent call last):
  File "pipeline_func.py", line 405, in <module>
    pipe.add_function_step(
  File "/home/vscode/.local/lib/python3.8/site-packages/clearml/automation/controller.py", line 616, in add_function_step
    if step in self._nodes and artifact in self._nodes[step].return_artifacts:
TypeError: argument of type 'NoneType' is not iterable

I tried with dummy solution in controller.py:

if step in self._nodes:
    if self._nodes[step].return_artifacts:
        if artifact in self._nodes[step].return_artifacts:
            function_input_artifacts[k] = "${{{}.id}}.{}".format(step, artifact)
            continue

Pipeline started work, but output models are store automatically as artifact anyway.

lpogo avatar Jun 02 '22 11:06 lpogo

Hey We would need some more info about your config in order to reproduce the issue. Can you please give me your clearml version ? If you have some more examples about your pipeline usage, it could also help

DavidNativ avatar Jun 02 '22 14:06 DavidNativ

Hi, Everything is working on GCP. To create ClearML Server I used your base image created for GCP. Agents on my own machines:

clearml                   1.4.0
clearml-agent             1.2.3

Agent config printed during start:

sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.aws.s3.key =
sdk.aws.s3.region =
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server = http://10.X.Y.Z:8008
api.web_server = http://10.X.Y.Z:8080
api.files_server = http://10.X.Y.Z:8081
api.credentials.access_key = VERYSECRET
api.host = http://10.X.Y.Z:8008
agent.worker_id = gcp:2
agent.worker_name = gcp:2
agent.force_git_ssh_protocol = true
agent.python_binary =
agent.package_manager.type = pip
agent.package_manager.pip_version = <20.2
agent.package_manager.system_site_packages = true
agent.package_manager.force_upgrade = false
agent.package_manager.conda_channels.0 = pytorch
agent.package_manager.conda_channels.1 = conda-forge
agent.package_manager.conda_channels.2 = defaults
agent.package_manager.torch_nightly = false
agent.venvs_dir = /home/lukasz/.clearml/venvs-builds.1
agent.venvs_cache.max_entries = 10
agent.venvs_cache.free_space_threshold_gb = 2.0
agent.vcs_cache.enabled = true
agent.vcs_cache.path = /home/lukasz/.clearml/vcs-cache
agent.venv_update.enabled = false
agent.pip_download_cache.enabled = true
agent.pip_download_cache.path = /home/lukasz/.clearml/pip-download-cache
agent.translate_ssh = true
agent.reload_config = false
agent.docker_pip_cache = /home/lukasz/.clearml/pip-cache
agent.docker_apt_cache = /home/lukasz/.clearml/apt-cache.1
agent.docker_force_pull = false
agent.enable_task_env = false
agent.hide_docker_command_env_vars.enabled = true
agent.hide_docker_command_env_vars.parse_embedded_urls = true
agent.docker_internal_mounts.sdk_cache = /clearml_agent_cache
agent.docker_internal_mounts.apt_cache = /var/cache/apt/archives
agent.docker_internal_mounts.ssh_folder = /root/.ssh
agent.docker_internal_mounts.pip_cache = /root/.cache/pip
agent.docker_internal_mounts.poetry_cache = /root/.cache/pypoetry
agent.docker_internal_mounts.vcs_cache = /root/.clearml/vcs-cache
agent.docker_internal_mounts.venv_build = /root/.clearml/venvs-builds
agent.docker_internal_mounts.pip_download = /root/.clearml/pip-download-cache
agent.apply_environment = true
agent.apply_files = true
agent.custom_build_script =
agent.git_user =
agent.default_python = 3.9
agent.cuda_version = 0
agent.cudnn_version = 0

I believe that this one example will reproduce the issue. In my pipeline I have multiple steps and I am using different parameters combination. Every time lack of function_return parameter generated error.

lpogo avatar Jun 03 '22 07:06 lpogo

Hi @lpogo! Just wanted to let you know that we couldn't reproduce the issue as of now. If you find the reason why this doesn't work for you in the meantime, could you please let us know? Thank you!

Hi @eugen-ajechiloae-clearml

I just created my environment locally (Docker Desktop on WSL - Ubuntu 20.04) Environment started from your docker compose file - so everything in the newest available version. In my development container I have installed clearml==1.4.1. My script looks like:

from clearml import PipelineController

def preprocessing(dataset_id: str, dataset_project: str, dataset_name: str,dataset_filename: str):
    print('step preprocessing')
    # import libs
    from clearml import Dataset
    import pandas as pd

    if dataset_id:
        dataset_path = Dataset.get(
            dataset_id=dataset_id
        ).get_local_copy()
    else:
        dataset_path = Dataset.get(
            dataset_project=dataset_project,
            dataset_name=dataset_name
        ).get_local_copy()

    filepath = f'{dataset_path}/{dataset_filename}'
    print(f'dataset_path: {filepath}')
    df = pd.read_csv(filepath, low_memory=False)
    df.drop('Time', axis=1, inplace=True)
    return df


def prepare_config(data_frame, parameters):
    print('step prepare_config')
    # secret stuff here
    

if __name__ == '__main__':
    pipe = PipelineController(
        name='UsingFunction',
        project='ClearMLTest',
        version='0.1.1',
        add_pipeline_tags=False,
    )
    pipe.set_default_execution_queue('MyQueue')

    # Dataset parameters
    pipe.add_parameter(name='dataset_id',description='Id of dataset task', default=None)
    pipe.add_parameter(name='dataset_project',description='Dataset project name', default='FraudDetection/Datasets/monthly')
    pipe.add_parameter(name='dataset_name',description='Name of dataset task', default='2022_02')
    pipe.add_parameter(name='dataset_filename',description='Name of the destination file', default='February_2022.csv')

    pipe_parameters = pipe.get_parameters()

    pipe.add_function_step(
        name='preprocessing',
        function=preprocessing,
        function_kwargs=dict(dataset_id='${pipeline.dataset_id}', dataset_project='${pipeline.dataset_project}', dataset_name='${pipeline.dataset_name}', dataset_filename='${pipeline.dataset_filename}'),
        function_return=['data_frame'],
        cache_executed_step=True,
    )

    pipe.add_function_step(
        name='prepare_config',
        parents=['preprocessing'],  
        function=prepare_config,
        function_kwargs=dict(data_frame='${preprocessing.data_frame}', parameters=pipe_parameters),
        # function_return=[],
        cache_executed_step=True,
    )

    pipe.start(queue='MyQueue')
    #pipe.start_locally()
    print('process completed')

Then I start script on my development machine:

vscode@9598f9122548:/workspaces/scripts$ python3 pipeline_func.py 
ClearML Task: created new task id=55b0e31a2a4a4aadb42c8e6a67815741
ClearML results page: http://clearml-webserver:80/projects/dd42a80e05bf48ec8692680c6a44aeaa/experiments/55b0e31a2a4a4aadb42c8e6a67815741/output/log
2022-06-07 12:34:20,865 - clearml.Task - INFO - No repository found, storing script code instead
ClearML pipeline page: http://clearml-webserver:80/pipelines/dd42a80e05bf48ec8692680c6a44aeaa/experiments/55b0e31a2a4a4aadb42c8e6a67815741
Traceback (most recent call last):
  File "pipeline_func.py", line 421, in <module>
    pipe.add_function_step(
  File "/usr/local/envs/rapids-22.02/lib/python3.8/site-packages/clearml/automation/controller.py", line 616, in add_function_step
    if step in self._nodes and artifact in self._nodes[step].return_artifacts:
TypeError: argument of type 'NoneType' is not iterable

lpogo avatar Jun 07 '22 12:06 lpogo

Hi @lpogo I cannot reproduce your issue, despite that I used a similar code than the one you provided, with the same clearml and agent version. Can you also send your server version ? (you can find it on the settings page, on the webapp)

DavidNativ avatar Jun 09 '22 08:06 DavidNativ

HI @DavidNativ,

WebApp: 1.5.0-192 • Server: 1.5.0-192 • API: 2.18

lpogo avatar Jun 09 '22 09:06 lpogo

Hi @lpogo,

I modified your code a tiny bit (removed the file processing and just returned an empty pd in preprocessing, and modified the dataset_id default value - which BTW, when it's none failed for me). And this seems to work....

from clearml import PipelineController


def preprocessing(dataset_id: str, dataset_project: str, dataset_name: str, dataset_filename: str):
    print(dataset_id, dataset_project, dataset_name, dataset_filename)
    print('step preprocessing')
    # import libs
    from clearml import Dataset
    import pandas as pd

    df = pd.DataFrame()
    return df


def prepare_config(data_frame, parameters):
    print('step prepare_config')
    # secret stuff here


if __name__ == '__main__':
    pipe = PipelineController(
        name='UsingFunction',
        project='ClearMLTest',
        version='0.1.1',
        add_pipeline_tags=False,
    )
    pipe.set_default_execution_queue('1xGPU')

    # Dataset parameters
    pipe.add_parameter(name='dataset_id', description='Id of dataset task', default='1234')
    pipe.add_parameter(name='dataset_project', description='Dataset project name', default='FraudDetection/Datasets/monthly')
    pipe.add_parameter(name='dataset_name', description='Name of dataset task', default='2022_02')
    pipe.add_parameter(name='dataset_filename', description='Name of the destination file', default='February_2022.csv')

    pipe_parameters = pipe.get_parameters()

    pipe.add_function_step(
        name='preprocessing',
        function=preprocessing,
        function_kwargs=dict(dataset_id='${pipeline.dataset_id}', dataset_project='${pipeline.dataset_project}', dataset_name='${pipeline.dataset_name}',
                             dataset_filename='${pipeline.dataset_filename}'),
        function_return=['data_frame'],
        cache_executed_step=True,
    )

    pipe.add_function_step(
        name='prepare_config',
        parents=['preprocessing'],
        function=prepare_config,
        function_kwargs=dict(data_frame='${preprocessing.data_frame}', parameters=pipe_parameters),
        # function_return=[],
        cache_executed_step=True,
    )

    pipe.start(queue='1xGPU')
    #pipe.start_locally(run_pipeline_steps_locally=True)
    print('process completed')

Can you give it a go and tell me if it works for you?

and just to make double check, can you please try installing 1.4.2rc1 (pip install clearml==1.4.2rc1)? There was a fix surrounding this area which might magically solve this issue :)

Thanks! :)

erezalg avatar Jun 09 '22 11:06 erezalg