Feature request: On Dockerfiles, allow filepaths to be excluded from ADD and COPY commands
This feature will be useful for multi stage builds, where some files should be copied in some stages but not others.
I imagine it looking like it follows:
FROM myimage AS not_py
COPY --exclude=sources/*.py sources /src
FROM myimage AS not_cpp_or_txt
COPY --exclude=sources/*.cpp --exclude=/sources/*.txt sources /src
In such example, /sources has some files required by not_py but not by not_cpp_or_txt, and vice-versa.
As far as I am aware, such construct at the moment cannot be accomplished with .dockerignore, which seems to prevent files to be copied to the context, whereas the exclude proposed adds files to the context, but not to the commands that use it.
.dockerignore, which seems to prevent files to be copied to the context, whereas the exclude proposed adds files to the context, but not to the commands that use it.
What is the benefit of that? We should not spend time transferring files to the daemon where they would be excluded by the copy step later. This doesn't mean that copy side shouldn't filter at all (copies between stages, multiple copies with different filters) but it should happen both for local source transfer and for copy.
I'm not so sure about the proposed syntax. It looks like it can become quite verbose if used as a replacement for a dockerignore. Maybe some way to define a new stage based on another filter with an additional filter definition, where the filter has multiple rules. Not sure exactly but https://github.com/moby/buildkit/issues/4239 also goes for a way to define new stages. It is also possible that this case is different enough from the dockerignore case and there will not be too many rules.
Some other considerations:
- Do we expect many copies to use same excludes? Then they should be defined together somehow.
- Is it ok that there are excludes but not includes?
--parentsis usingIncludePatternsinternally I think.
@thaJeztah @AkihiroSuda @cpuguy83
Thx @tonistiigi for reviewing this issue.
I can see that there were other proposals over time, in addition to the one you linked, such as:
And this is a very similar proposal (meaning the current issue is actually a duplicate)
- https://github.com/moby/buildkit/issues/2853
I mentioned the context in the original text, but in retrospect it was probably a mistake, as I confess I don't understand in details how the context works. The main target for this feature is actually the build cache.
I think a generic --filter flag would be fine so we can handle includes as well:
FROM myimage AS not_py
COPY --filter=!*.py sources /src
FROM myimage AS not_cpp_or_txt
# OR
COPY --filter=!*.cpp --filter=!*.txt sources /src
FROM myimage AS combine
# AND
COPY --filter=!*.cpp,!*.py sources /src
FROM myimage AS only_cpp
COPY --filter=*.cpp sources /src
FROM myimage AS all
# default behavior as if we don't set --filter flag
COPY --filter=** sources /src
For more control we can use a comma separated list of match object instead of a simple path glob (combine stage). If multiple filters are set, they are combined using the OR operation, while individual match rules within an object are combined using the AND operation. By incorporating ! negation, you can create intricate matching rules.
What is the benefit of that? We should not spend time transferring files to the daemon where they would be excluded by the copy step later. This doesn't mean that copy side shouldn't filter at all (copies between stages, multiple copies with different filters) but it should happen both for local source transfer and for copy.
and
Is it ok that there are excludes but not includes?
:point_right: This is just a first "quick" blurb, but I know there's been various ticket around this (or similar) proposals, and we may have to go through use-cases mentioned in those.
- some use-cases evolved around multiple targets using different sets of files in the same context; having a single
.dockerignoremade that more complicated (or not possible) - use-cases where
--mountis used to mount files, butCOPYstatement to not include those files (or directories). - use-cases where the context contains a large number of files and directories from which the majority of files have to be included, except from some files (these may be generated files, or other artifacts). Describing files to include through
COPYand wildcards is much more complicated than to describe what files to exclude. The--parentsoption alleviates some use-cases here, but not all of those.
I'm not so sure about the proposed syntax. It looks like it can become quite verbose if used as a replacement for a dockerignore.
I definitely think this can be a concern. Using a --flag for this works for basic use-cases, but can get complicated fast. I know there's been some proposals to have --ignorefile=xxx to specify a .dockerignore which could alleviate some of that, but also may be opening a new can of worms.
- use-cases where --mount is used to mount files, but COPY statement to not include those files (or directories).
This is my use case 🙋♂️
Specifically, I'm generating some "clean" versions of my package.json and package-lock.json that doesn't contain the version property, in order to get better caching.
My Dockerfile looks something like this:
FROM public.ecr.aws/lambda/nodejs:20.2023.11.21.13
RUN dnf install -y gcc gcc-c++ git make openssl-devel tar zip
WORKDIR /var/task
# This layer can be cached even between version bumps (or any other metadata changes that doesn't affect deps)
RUN --mount=type=bind,source=clean-package.json,target=package.json --mount=type=bind,source=clean-package-lock.json,target=package-lock.json npm ci
# These two layers would be condensed to one:
COPY . .
RUN rm clean-package.json clean-package-lock.json
RUN npm run-script build
For me, it would be great with a simple --exclude flag to COPY:
COPY --exclude=clean-package.json --exclude=clean-package-lock.json . .
This is not meant to replace .dockerignore, which we already use to ignore a lot of files.
We discussed this in one of the moby maintainers meetings.
Most people prefered the exclude over filter/include to cover the most basic cases. And possibly extend this later if reusing same filters becomes a problem. But exclude should be enough to test this in dockerfile labs channel.
Regarding implementation:
- The exclude rules need to be integrated to context transfer as well, in addition to the filter during copy. If big files are excluded and never used during build then time should not be wasted to transfer them to the daemon. This filtering already works this way for the source paths and dockerignore paths.
- @aaronlehmann is the current contenthash cache implementation (#2082) already covering all cases for this, including
**etc. so that no excluded file impacts the cache checksum? If not then a workaround would be to do a internal copy to scratch stage first and then let next copy recompute the checksum. Hopefully this is not needed but needs to be verified with integration tests as well.
is the current contenthash cache implementation (https://github.com/moby/buildkit/pull/2082) already covering all cases for this, including ** etc. so that no excluded file impacts the cache checksum?
I believe so. That is the intent.
Regarding implementation:
- The exclude rules need to be integrated to context transfer as well, in addition to the filter during copy. If big files are excluded and never used during build then time should not be wasted to transfer them to the daemon. This filtering already works this way for the source paths and dockerignore paths.
If I understood properly, this mean that a file would need not be added to the context if it's ignored by .dockerignore AND from stages defined. Is it correct?
In this case, a file which is excluded (via --exclude) from all but one build stage must be added to the context.
I wonder if for this specific use case simply using .dockerignore would not be a better option and how much of an edge case it'd be.
I've created a change for the second suggestion (the cache) on https://github.com/moby/buildkit/pull/4561 and would be happy to get some feedback.
Thank you in advance.
AND from stages defined. Is it correct?
and from stages in use. Dockerfile frontend will already determine if a stage is reachable for a given build.
In this case, a file which is excluded (via --exclude) from all but one build stage must be added to the context.
Yes, there are cases where we can't be super precise about it. I think if there are multiple active copies, one has exclude and another one is using the excluded files then we should still transfer with one context and ignore the excludes (let the excludes happen via copy). But for simple cases where there is only one copy or all excludes are the same it should be possible to do the filter on transfer. I'm not sure how far we should go in determining if exclude from one COPY can collide with a path from another COPY. Just handling the most basic cases is probably fine initially. If common pattern appears later that is not handled by this case it can be improved later.
I am continuing the discussion in the proposed PR https://github.com/moby/buildkit/pull/4561