clearml-agent icon indicating copy to clipboard operation
clearml-agent copied to clipboard

clearml-agent with poetry does not reuse venv

Open mads-oestergaard opened this issue 2 years ago • 4 comments

We are using clearml-agents with poetry as dependency manager, but the agent does not reuse task environment between task execution, as it does when using pip.

Telling poetry to place the .venv folder inside the repo does not change this. Specifically, we set agent.package_manager.type = "poetry" in clearml.conf.

Possibly related to https://github.com/allegroai/clearml-agent/issues/74, but we are not running the agents in docker mode.

Output of a task execution:

...
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv latent-features in /data/clearml/venvs-builds/3.9/task_repository/our-repo/.venv
Installing dependencies from lock file
...

Versions: clearml-agent @ 1.5.1 poetry @ 1.2.0

mads-oestergaard avatar Mar 15 '23 11:03 mads-oestergaard

Hi @mads-oestergaard,

It is not clear how to cache poetry envs, because the content of poetry file is not stored in clearml it is part of the git repo. I'm afraid that just hashing the content will not not be enough, WDYT?

jkhenning avatar Mar 15 '23 12:03 jkhenning

How about setting POETRY_CACHE_DIR env var to the path specified in agent.venvs_dir in all poetry-related tasks? Something like this in line 105:

venvs_dir= str(self.session.config.get("agent.venvs_dir"))
argv = Argv(f"POETRY_CACHE_DIR={venvs_dir}", self._python, "-m", "pip", "install", "poetry{}".format(version), "--upgrade", "--disable-pip-version-check") 

and then not executing line 113. The POETRY_CACHE_DIR env var should of course be set on all poetry-commands.

Also, I've been running into some issues with the pip installed poetry package. Have you considered changing the install method to use the recommended one? Line 105 would then be (something like):

argv = Argv("curl", " -sSL", "https://install.python-poetry.org", "|",  self._python, "-", "--version", "install", version.split("==")[-1]) 

mads-oestergaard avatar Mar 20 '23 14:03 mads-oestergaard

After having played a bit around with this, it would seem to be a bit more complicated than I originally thought..

  1. I think that the install method is correct (using python -m pip ...), although it has it's issues (e.g. version 1.2.0 will install an incompatible poetry-export-plugin, but that's not a fault in ClearML).
  2. Using system poetry on a system where poetry is also used for normal dev work pollutes the regular development flow - changes made with poetry config --local gets reflected in my local poetry.toml.

However, by limiting clearml-agent to use the python -m poetry command and configuring the cache-dir I managed to make my agent reuse the environment: (this is in lines 112-113 in clearml_agent/helper/package/poetry_api.py)

try:
    venvs_dir = str(self.session.config.get("agent.venvs_dir"))
    self._config("cache-dir", f"{venvs_dir}", "--local")
    self._config("virtualenvs.in-project", "false", "--local")

Then, in task 1 I get the following log output:

Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv latent-features-odrGogDc-py3.9 in /data/clearml/venvs-builds.2/virtualenvs
Installing dependencies from lock file
Package operations: 246 installs, 1 update, 0 removals

and in task 2 the venv is reused:

Installing dependencies from lock file

Package operations: 0 installs, 0 updates, 0 removals

Running task id [d2f3645170084e82a1126c5e91a6a3f7]:

But ... Running poetry config --list elsewhere will still give me the edits that the agent made. If I instead start the agent and specify the same arguments as environment variables, then it works (given that we don't try to set the variables in clearml): POETRY_VIRTUALENVS_IN_PROJECT=false POETRY_CACHE_DIR=/data/clearml/venvs-builds.2 clearml-agent daemon --queue default

mads-oestergaard avatar Mar 20 '23 20:03 mads-oestergaard

Environment variables take precedence over poetry config commands, so setting POETRY_VIRTUALENVS_IN_PROJECT and POETRY_CACHE_DIR on the daemon works without modifying the code in clearml-agent.

I consider closing this issue, but I do think that this would be a better default mode.

mads-oestergaard avatar Mar 21 '23 08:03 mads-oestergaard