poetry icon indicating copy to clipboard operation
poetry copied to clipboard

Installing only packages from certain sources to optimize docker builds

Open OceanManOne opened this issue 3 years ago • 1 comments

  • [X] I have searched the issues of this repo and believe that this is not a duplicate.
  • [X] I have searched the FAQ and general documentation and believe that my question is not already covered.

Feature Request

I'm using poetry for a very large python project which include a lot of internal and external packages. We build a lot of containers and in the build stage we install the packages using poetry install. Every time we change our code of internal packages we have to rebuild the container from 0 which takes a lot of time. We would like to optimize it by copying only the pyproject.toml and poetry.lock files and install the external python packages only for the first stage to have it cache for the next builds.

Similar to how multistage builds work are recommended in npm projects, see example of the multistage builds: https://cloudnweb.dev/2019/10/crafting-multi-stage-builds-with-docker-in-node-js/

In order to do that we need the ability to specify to install only the external packages (pypi, git etc..) from the lock file without require the internal packages code.

OceanManOne avatar Feb 10 '23 13:02 OceanManOne

Could I ask you to please expand the scope of your issue? This is a huge performance improvement not limited to docker builds.

I suggest "Add runtime option to specify install source(s), allowing to greatly optimize builds based on context (Docker, CI, etc.)". But it is up to you.

There is already a lot of discussion that would be solved by exactly this feature, see: #2339 . Comment: https://github.com/python-poetry/poetry/issues/2339#issuecomment-707633106

In my case, for CI, I want to only hit 1 specific server to reduce the load of CI builds. But hit a different server in other contexts (such as prod, local-dev, etc.).

A common example we are facing is with an internal repository, but CI is run by external cloud service that cannot be granted access due to security reasons. In this case, CI makes hundreds of denied requests per job to a server that does not even DNS resolve or respond, and will eventually be killed by the CI tool such as GitHub actions.

This means that during CI the lockfile used for deployment cannot be used, and poetry must be installed from a source accessible by the CI tool, such as public pip or another private repo. If poetry install could have a --source argument, then this could be fixed easily in CI or really at anytime! Specifying per package install sources does not fix this, as source selection is not static. It depends on the execution context (docker, CI, internal vs. external server, etc.)

Current workaround is to build a new temporary lockfile after removing any inaccessible sources, only during CI. But this is not ideal for obvious reasons. (We do not want to deal with parallel lockfiles).

emirkmo avatar Feb 18 '23 08:02 emirkmo

@OceanManOne I'm not sure I got exactly what you are asking. Are the internal packages you are referring to references through path dependencies? If so, then Poetry 1.5 just got a brand new --no-directory option that does exactly what you want ;)

If that's not the case, then I don't understand what you are trying to solve. If internal packages are hosted on a private repository and you want to reference them in your project, they will need to be part of your lockfile. Changing code of a private dependency will likely result in a new version being released, and a change in the lockfile, which will invalidate the Docker cache.

ralbertazzi avatar Jun 01 '23 15:06 ralbertazzi