clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

` trains_agent: ERROR: Can not run task without repository or literal script in `script.diff`` when executing locally

Open iirekm opened this issue 4 years ago • 10 comments

task = Task.init(project_name="...", task_name="...", reuse_last_task_id=False)
...
subtask = Task.clone(task)
Task.enqueue(subtask, queue_name="default")

Gives subtask failing with trains_agent: ERROR: Can not run task without repository or literal script inscript.diff``

What's wrong?

iirekm avatar Nov 12 '20 14:11 iirekm

Manually setting subtask.data.script = task.data.script doesn't work.

iirekm avatar Nov 12 '20 14:11 iirekm

Workaround is adding task.execute_remotely(queue_name="default")

task = Task.init(project_name="", task_name="test2")
task.execute_remotely(queue_name="default")
print(task.name, task.id, task.get_parameters())
if task.get_parameter("General/is_subtask", "False") != "True":
    subtask: Task = Task.clone(task, name="test2-subtask", parent=task.id)
    subtask.set_parameters(is_subtask=True)
    Task.enqueue(subtask, queue_name="default")

iirekm avatar Nov 12 '20 15:11 iirekm

@iirekm I think it might be a simple matter of delay - it takes some time for Trains to detect repository, and it's possible you're cloning too fast :) I suggest adding sleep(20) to figure out if that helps. If that's the issue, you can potentially configure trains to detect repository in synchronous mode using the development.vcs_repo_detect_async setting (defaults to true)

jkhenning avatar Nov 12 '20 15:11 jkhenning

Are you sure it's repo cloning? Even data.script on cloned task is empty, so basically it had no chance to even know about any repo

iirekm avatar Nov 12 '20 15:11 iirekm

Not the cloning, but the detection - detecting which VCS, repository and scrip is running (and generating a diff if there are uncommitted files) is done by default in another thread - it's possible the thread simply didn't finish it's job before your script cloned the new task.

jkhenning avatar Nov 12 '20 15:11 jkhenning

Aaa, now I understand, so your sleep idea would be second workaround. But anyway, both are workarounds, ideally clone implementation should somehow wait for this thread.

iirekm avatar Nov 12 '20 15:11 iirekm

@iirekm just adding a bit of details. the executed_remotely call will actually wait fro the repository detection, and will do the enqueuing for you (as I think you already realized). An additional minor detail, executed_remotely will leave the calling process when executed locally (i.e. the lines after the call will by definition be executed only remotely by the trains-agent. This behavior might sound a bit weird at first but it saves a lot of nested 'if's :) anyhow this is of course a choice and you can specify exit_process=False

What I'm missing is what is the goal of the subtask ? Is it the next stage in the pipeline ? If this is the case, I'm assuming you only want to clone/enqueue it one the first step is done, is that correct?

bmartinn avatar Nov 12 '20 20:11 bmartinn

yes, this is next stage, and I want to create both stages in same file for convenience, I don't want to use already-defined tasks like examples, because it's too easy to get lost

iirekm avatar Nov 12 '20 20:11 iirekm

@iirekm take a look at this one it might help ? (Basically create a Task from a function at runtime, I "think" this is what you are after?!)

bmartinn avatar Nov 12 '20 21:11 bmartinn

yes, that's great, it works

iirekm avatar Nov 13 '20 08:11 iirekm