clearml-agent ` trains_agent: ERROR: Can not run task without repository or literal script in `script.diff`` when executing locally

` trains_agent: ERROR: Can not run task without repository or literal script in `script.diff`` when executing locally

Open iirekm opened this issue 4 years ago • 10 comments

task = Task.init(project_name="...", task_name="...", reuse_last_task_id=False)
...
subtask = Task.clone(task)
Task.enqueue(subtask, queue_name="default")

Gives subtask failing with trains_agent: ERROR: Can not run task without repository or literal script inscript.diff``

What's wrong?

Nov 12 '20 14:11 iirekm

Manually setting subtask.data.script = task.data.script doesn't work.

Nov 12 '20 14:11 iirekm

Workaround is adding task.execute_remotely(queue_name="default")

task = Task.init(project_name="", task_name="test2")
task.execute_remotely(queue_name="default")
print(task.name, task.id, task.get_parameters())
if task.get_parameter("General/is_subtask", "False") != "True":
    subtask: Task = Task.clone(task, name="test2-subtask", parent=task.id)
    subtask.set_parameters(is_subtask=True)
    Task.enqueue(subtask, queue_name="default")

Nov 12 '20 15:11 iirekm

@iirekm I think it might be a simple matter of delay - it takes some time for Trains to detect repository, and it's possible you're cloning too fast :) I suggest adding sleep(20) to figure out if that helps. If that's the issue, you can potentially configure trains to detect repository in synchronous mode using the development.vcs_repo_detect_async setting (defaults to true)

Nov 12 '20 15:11 jkhenning

Are you sure it's repo cloning? Even data.script on cloned task is empty, so basically it had no chance to even know about any repo

Nov 12 '20 15:11 iirekm

Not the cloning, but the detection - detecting which VCS, repository and scrip is running (and generating a diff if there are uncommitted files) is done by default in another thread - it's possible the thread simply didn't finish it's job before your script cloned the new task.

Nov 12 '20 15:11 jkhenning

Aaa, now I understand, so your sleep idea would be second workaround. But anyway, both are workarounds, ideally clone implementation should somehow wait for this thread.

Nov 12 '20 15:11 iirekm

@iirekm just adding a bit of details. the executed_remotely call will actually wait fro the repository detection, and will do the enqueuing for you (as I think you already realized). An additional minor detail, executed_remotely will leave the calling process when executed locally (i.e. the lines after the call will by definition be executed only remotely by the trains-agent. This behavior might sound a bit weird at first but it saves a lot of nested 'if's :) anyhow this is of course a choice and you can specify exit_process=False

What I'm missing is what is the goal of the subtask ? Is it the next stage in the pipeline ? If this is the case, I'm assuming you only want to clone/enqueue it one the first step is done, is that correct?

Nov 12 '20 20:11 bmartinn

yes, this is next stage, and I want to create both stages in same file for convenience, I don't want to use already-defined tasks like examples, because it's too easy to get lost

Nov 12 '20 20:11 iirekm

@iirekm take a look at this one it might help ? (Basically create a Task from a function at runtime, I "think" this is what you are after?!)

Nov 12 '20 21:11 bmartinn

yes, that's great, it works

Nov 13 '20 08:11 iirekm

clearml-agent clearml-agent copied to clipboard

` trains_agent: ERROR: Can not run task without repository or literal script in `script.diff`` when executing locally

clearml-agent
clearml-agent copied to clipboard