clearml icon indicating copy to clipboard operation
clearml copied to clipboard

remote execution with local python package

Open elinep opened this issue 3 years ago • 18 comments

Hi,

demo project clearml demo server runs

I have a project with local package involving source compilation that need to be install via pip. The requirements.txt specifies the local package as: ./local_package

I'd like to run clearml-agent on such project but the package detection stage seems to only report the name of the local package and not the location as it would do for packages installed from a git repository.

So when I try to execute such a task with clearml-agent, the installation fails:

ERROR: Could not find a version that satisfies the requirement my_local_package==0.0.0 (from -r /tmp/cached-reqsj92bjyak.txt (line 4)) (from versions: none)
ERROR: No matching distribution found for my_local_package==0.0.0 (from -r /tmp/cached-reqsj92bjyak.txt (line 4))

clearml_agent: ERROR: Could not install task requirements!

I can still edit on the webui the install packages to add ./local_package but this is cumbersome. Moreover, after exectution, the install packages field stores an absolute path for this local package which can break replication on another worker.

A solution would be to move local_package on a git repository and install it through pip install git+... but my team is not willing to do it for their own reasons.

What do you think ?

elinep avatar Mar 09 '21 09:03 elinep

Hi @elinep In your code, before calling Task.init add the following line:

Task.add_requirements("./local_package")

If they do agree to create a git repo :) you would just add the git directly with:

Task.add_requirements("git+https://github.com/...")

Notice, please test with the latest RC, I remember there was a fix to improve support for local packages:

pip install clearml==0.17.5rc5

bmartinn avatar Mar 10 '21 00:03 bmartinn

Thanks @bmartinn.

I guess I'll have to manage the task requirements manually as you suggest.

What about the fact that once executed the task requirements is modified with worker internal path for local packages:

# draft task
./my_local_package
# once executed by clearml-agent
my_local_package @ file:///.../.clearml/venvs-builds/3.8/task_repository/trains_gitsm_local_install.git/my_local_package

This task is now likely to fail (or run with a wrong version) if we try to run it on another worker as the path for my_local_package is probably not valid.

elinep avatar Mar 15 '21 13:03 elinep

This task is now likely to fail (or run with a wrong version) if we try to run it on another worker as the path for my_local_package is probably not valid.

You have a very good point, this is definitely a bug. I'm updating here once the clearml-agent fixes this issue (basically updating back should replace back the local package link)

bmartinn avatar Mar 16 '21 21:03 bmartinn

Hi @elinep A fix was pushed, if you feel like testing before the RC is out :)

pip3 install git+https://github.com/allegroai/clearml-agent.git

bmartinn avatar Mar 26 '21 22:03 bmartinn

Just updating here that an RC is out :)

pip install clearml==-0.17.6rc1

bmartinn avatar Apr 13 '21 01:04 bmartinn

git+https://github.com/

I wish there was a way to link a specific remote git branch. Tried

git+https://github.com/...#branch

and the /tree/branch link, both didn't work. Had to link to a local clone branch checkout

I wonder how clearml resolves in poetry mode the poetry deps like:

river = { git = "https://github.com/ColdTeapot273K/river.git", branch = "feature/mini-batch-support" }

(Can't check since poetry mode is not working for me rn, see #545)

ColdTeapot273K avatar Jan 21 '22 07:01 ColdTeapot273K

Hi @ColdTeapot273K

I wish there was a way to link a specific remote git branch. Tried

There ism the following should work (not please make sure pip >= 20), in tour "Installed Packages" you can add this line

git+https://github.com/allegroai/clearml.git@12fa7c92aaf8770d770c8ed05094e924b9099c16

Which would install the clearml package directly from the git repository at the commit-id 12fa7c92aaf8770d770c8ed05094e924b9099c16 You can also replace the specific commit id with branch name, both will work :)

bmartinn avatar Jan 22 '22 23:01 bmartinn

Hi @elinep In your code, before calling Task.init add the following line:

Task.add_requirements("./local_package")

This is not correct, you can add a requirements.txt file this way, not a local python package file.

JeremyMahieu avatar May 27 '23 19:05 JeremyMahieu

Is this still not resolved? I am also facing this issue

ERROR: Could not find a version that satisfies the requirement google_cloud_storage==2.9.0 (from -r /tmp/cached-reqs_xokg8xa.txt (line 2)) (from versions: 0.20.0, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.1.0, 1.1.1, 1.2.0, 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0, 1.11.1, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 1.13.2, 1.13.3, 1.14.0, 1.14.1, 1.15.0, 1.15.1, 1.15.2, 1.16.0, 1.16.1, 1.16.2, 1.17.0, 1.17.1, 1.18.0, 1.18.1, 1.19.0, 1.19.1, 1.20.0, 1.21.0, 1.22.0, 1.23.0, 1.24.0, 1.24.1, 1.25.0, 1.26.0, 1.27.0, 1.28.0, 1.28.1, 1.29.0, 1.30.0, 1.31.0, 1.31.1, 1.31.2, 1.32.0, 1.33.0, 1.34.0, 1.35.0, 1.35.1, 1.36.0, 1.36.1, 1.36.2, 1.37.0, 1.37.1, 1.38.0, 1.39.0, 1.40.0, 1.41.0, 1.41.1, 1.42.0, 1.42.1, 1.42.2, 1.42.3, 1.43.0, 1.44.0, 2.0.0) ERROR: No matching distribution found for google_cloud_storage==2.9.0 (from -r /tmp/cached-reqs_xokg8xa.txt (line 2)) clearml_agent: ERROR: Could not install task requirements! Command '['/root/.clearml/venvs-builds/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqs_xokg8xa.txt']' returned non-zero exit status 1.

hotshotdragon avatar Jun 14 '23 15:06 hotshotdragon

@hotshotdragon this issue is related to local packages, not to packages found in pypi. The error you see is because the agent is likely using Python 3.7, for which google_cloud_storage is only supported up to version 2.0.0

jkhenning avatar Jun 14 '23 15:06 jkhenning

@jkhenning Thanks for the quick response, I found a temporary fix to the issue. But I am using 3.10 everywhere, not sure how the agent is using 3.7. Also I am not using google cloud storage anywhere, so why it is getting picked up, I am not sure about that as well.

Another thing, every time I do a local run and then clone that experiment to run from agent, it gives error of module not found. Issue is mentioned here https://github.com/allegroai/clearml/issues/503

hotshotdragon avatar Jun 15 '23 11:06 hotshotdragon

It's possible this is the only python version that agent finds in the docker container it's using?

jkhenning avatar Jun 15 '23 15:06 jkhenning

In console I found this "Python executable with version '3.10' requested by the Task, not found in path, using '/usr/bin/python3' (v3.6.9) instead"

maybe this is causing issue. I am not sure how can I change the py version

hotshotdragon avatar Jun 16 '23 07:06 hotshotdragon

Hi @hotshotdragon, this simply means python version 3.10 was not installed on the docker image used to run the code - if you use an image with 3.10 installed, the agent should be able to find it and use it as required

jkhenning avatar Jun 17 '23 20:06 jkhenning

I solved it. I was missing task.set_base_docker method

hotshotdragon avatar Jun 21 '23 13:06 hotshotdragon

Wow he hijacks the issue with somethign off topic and then you close it @jkhenning

JeremyMahieu avatar Jun 21 '23 19:06 JeremyMahieu

Apologies @JeremyMahieu ! 🙏 🙏 I honestly lost track 🙁 - reopening, of course

jkhenning avatar Jun 21 '23 19:06 jkhenning

For anyone still wondering, I ran into the same issue with my local module and was able to force the agent to correctly build it on different machine by the following:

Task.ignore_requirements("my_module")
Task._force_requirements[
        "my_module @ file:///${PROJECT_ROOT}/../../my_module"
    ] = None

However, as I mentioned in https://github.com/allegroai/clearml-agent/issues/191, this feels quite hacky.

OldaKodym avatar Feb 26 '24 10:02 OldaKodym