hatch
hatch copied to clipboard
Separate installation of dependencies?
I am building a Dockerfile, using Python, with hatch as a build manager. I want to install dependencies separately from compiling py => pyc files, so that when my code changes I can cache the dependencies (which can take a while to redownload and install).
I'm not sure how I can build the dependencies in hatch as a separate step. (Maybe I can't read the hatch docs, but it doesn't seem to mention it anywhere)
My dockerfile looks something like (simplified)
FROM python
RUN python -m pip install hatch
COPY . /
RUN hatch build
I'd like to change it to
FROM python
RUN python -m pip install hatch
COPY pyproject.toml /
RUN hatch build-deps # this step should be cached if a different file is changed
COPY . /
RUN hatch build # this step shouldn't re-install deps again
Is this possible - if so, maybe this page should be updated to mention it - https://hatch.pypa.io/latest/config/dependency/ ?
Under the hood Hatch uses pip for dependency management currently so to persist the cache you would use its mechanisms: https://pip.pypa.io/en/stable/topics/caching/#where-is-the-cache-stored
Thanks - for future people - I found these 3 issues on pip:
- https://github.com/pypa/pip/issues/11440
- https://github.com/pypa/pip/issues/8049
- https://github.com/pypa/pip/issues/11584
The solution given (for now until 11440 happens) is to read the requirements out of the toml file and pipe them into pip. It's a bit ugly but it seems to work...
FROM python
RUN python -m pip install hatch
COPY pyproject.toml /
RUN pip install toml && python -c 'import toml; c = toml.load("pyproject.toml"); print("\n".join(c["build-system"]["requires"]))' | pip install -r /dev/stdin
RUN pip install toml && python -c 'import toml; c = toml.load("pyproject.toml"); print("\n".join(c["project"]["dependencies"]))' | pip install -r /dev/stdin
COPY . /
RUN hatch build # this step shouldn't re-install deps again
Hatch then uses these already downloaded files and doesn't grab them again, so if you modify only your source it should be faster.
I let Hatch build the environment fully, then use pip uninstall
to remove the application/library. This leaves the environment fully populated with all dependencies not only cached but already installed, which can save a lot of time if the installation of any of them includes compiling native code.
There was some discussion of this idea here: https://github.com/pypa/hatch/discussions/376
To install dependencies of an environment you can add a script with an empty body: hatch.toml
[envs.some-environment.scripts]
update-deps = ""
Then use it in Dockerfile like so:
# the next step can be cached
RUN hatch run update-deps
Won't that still install the project code? The idea of this issue is to have a way to install the dependencies without installing the project.
Maybe, I'm not sure. Sorry if I was wrong
The following seems to work fine for caching dependencies inside a docker file: RUN hatch dep show requirements | xargs pip install