clearml-agent
clearml-agent copied to clipboard
` trains_agent: ERROR: Can not run task without repository or literal script in `script.diff`` when executing locally
task = Task.init(project_name="...", task_name="...", reuse_last_task_id=False)
...
subtask = Task.clone(task)
Task.enqueue(subtask, queue_name="default")
Gives subtask failing with trains_agent: ERROR: Can not run task without repository or literal script in
script.diff``
What's wrong?
Manually setting subtask.data.script = task.data.script
doesn't work.
Workaround is adding task.execute_remotely(queue_name="default")
task = Task.init(project_name="", task_name="test2")
task.execute_remotely(queue_name="default")
print(task.name, task.id, task.get_parameters())
if task.get_parameter("General/is_subtask", "False") != "True":
subtask: Task = Task.clone(task, name="test2-subtask", parent=task.id)
subtask.set_parameters(is_subtask=True)
Task.enqueue(subtask, queue_name="default")
@iirekm I think it might be a simple matter of delay - it takes some time for Trains to detect repository, and it's possible you're cloning too fast :)
I suggest adding sleep(20)
to figure out if that helps. If that's the issue, you can potentially configure trains to detect repository in synchronous mode using the development.vcs_repo_detect_async
setting (defaults to true
)
Are you sure it's repo cloning? Even data.script on cloned task is empty, so basically it had no chance to even know about any repo
Not the cloning, but the detection - detecting which VCS, repository and scrip is running (and generating a diff if there are uncommitted files) is done by default in another thread - it's possible the thread simply didn't finish it's job before your script cloned the new task.
Aaa, now I understand, so your sleep idea would be second workaround. But anyway, both are workarounds, ideally clone implementation should somehow wait for this thread.
@iirekm just adding a bit of details.
the executed_remotely
call will actually wait fro the repository detection, and will do the enqueuing for you (as I think you already realized).
An additional minor detail, executed_remotely
will leave the calling process when executed locally (i.e. the lines after the call will by definition be executed only remotely by the trains-agent
. This behavior might sound a bit weird at first but it saves a lot of nested 'if's :) anyhow this is of course a choice and you can specify exit_process=False
What I'm missing is what is the goal of the subtask
? Is it the next stage in the pipeline ?
If this is the case, I'm assuming you only want to clone/enqueue it one the first step is done, is that correct?
yes, this is next stage, and I want to create both stages in same file for convenience, I don't want to use already-defined tasks like examples, because it's too easy to get lost
@iirekm take a look at this one it might help ? (Basically create a Task from a function at runtime, I "think" this is what you are after?!)
yes, that's great, it works