buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

Add command to list files in the build context

Open iinuwa opened this issue 6 years ago • 6 comments

It would be helpful to add a command that listed what files would appear in the build context during build for fine-tuning the .dockerignore file. This StackOverflow answer recommends using ncdu as an external tool, but this isn't necessarily available to Windows users. Being able to run something like docker builder context ls would be helpful, especially for beginners who are trying to learn to trim down their image sizes.

Perhaps the output could be something like the output of the tree command, with a flag to set the depth of the output.

iinuwa avatar Sep 26 '19 13:09 iinuwa

Files in the context depend on what kind of build you are running. In buildkit if you don't have COPY/ADD commands in the stages you are building there is no build context as well. And if you have only COPY foo . only file in build context for that build would be foo.

tonistiigi avatar Sep 27 '19 17:09 tonistiigi

I wasn't aware that there are different types of builds. Could you elaborate on that?

I understand that if you only specify one file, to copy, there is only one file in build context. However, for COPY statements that recursively copy from directories, a novice Docker developer may accidentally include more files than they intend. Take, for example, a C# project that is laid out this way:

src/<project namespace>/project.csproj
src/<project namespace>/Program.cs
src/<project namespace>/bin
src/<project namespace>/obj
.dockerignore
.gitignore
# .gitignore
bin/**
obj/**
# .dockerignore
bin/**
obj/**
# Dockerfile
FROM mcr.microsoft.com/dotnet/sdk:2.1
COPY . .
RUN dotnet build src/<project namespace>/project.csproj

A new Docker developer wouldn't realize the syntax difference between .gitignore and .dockerignore and mistakenly include the bin and obj directories in the build context. It's an easy mistake to make, and once made, it is also easy not to realize that you made the mistake.

The docker build command already shows the size of the build context as the first log message:

> docker build .
Sending build context to Docker daemon    2.5MB
...

If Docker is already aware of the size, shouldn't it also be able to know which files are included in the context?

I understand that it's a simple mistake and easy to correct, but I think the easier we can make this to debug for new users, the easier it will be for organizations to adopt container deployment over traditional deployments.

iinuwa avatar Sep 30 '19 16:09 iinuwa

@iinuwa I might be mistaken, but for that scenario, those folders would be ignored

What @tonistiigi was trying to say is that the context isn't all copied in a tar to the agent, but instead only the files it is required when asked for

FernandoMiguel avatar Sep 30 '19 17:09 FernandoMiguel

(I realize that I may be posting in the wrong repo; I believe the example output from docker build . was from the standard build mode, not DOCKER_BUILDKIT=1. )

@FernandoMiguel, the example I mentioned was describing the same situation that prompted me to post the issue. (Cf. this blog post showing the differences between .dockerignore and .gitignore) I can retest later next week to confirm.

I think that I understand what is included in the build context is dependent on what paths are specified in COPY/ADD statements in the Dockerfile. Often, COPY . or some other directory is included in Dockerfiles, which may include more files than expected. Perhaps what I'm asking for is for a tool that can parse a .dockerignore file and a Dockerfile for COPY/ADD statements and give the output of what would be included in the build context. If it decided that this doesn't fit in the upstream CLI, then I can write a tool for my team that does that.

iinuwa avatar Nov 24 '19 02:11 iinuwa

@iinuwa I don't think that is trivial, given that BuildKit will pull files depending of what the cache is, and for that it needs to follow the build a dry-run would be difficult, I'm guessing...

Care to explain your use case again, as of why you expect to benefit from knowing what files would be copied into an image?

FernandoMiguel avatar Nov 24 '19 08:11 FernandoMiguel

Care to explain your use case again, as of why you expect to benefit from knowing what files would be copied into an image?

For me, I was using a remote build and it was taking ages to upload the context. Discovering what wasn't being ignored took quite a while, but once I figured out what was wrong with the .dockerignore, the context took seconds to send to the remote build. Having some way to actually see what is being ignored (or not) can be very helpful.

withinboredom avatar Jun 22 '22 07:06 withinboredom

Care to explain your use case again, as of why you expect to benefit from knowing what files would be copied into an image?

I'm interested in this so that I can better integrate Docker with Gradle, which has excellent task-avoidance capabilities. If I can tell Gradle which files are used by Docker to build a specific image, then I can avoid even having a Docker instance running on a machine, improving build times. Gradle can share the up-to-date checks across machines, so if a build is successful on a build server, Gradle can avoid building it on my machine.

Additionally, being able to list the included files can help with testing that a .dockerignore file is configured correctly.

aSemy avatar Oct 23 '22 17:10 aSemy

Maybe a simpler approach would be already useful for the users?

Instead of showing the exact list of paths actually used in the specific build instance (considering cache, COPY commands, etc), how about showing the full list of paths allowed to be used in the build context, after the .dockerignore rules were applied? I guess that's what matters for most users tuning their ignore rules.

BTW, there's a new answer in the StackOverflow question mentioned by the OP, that achieves that by actually copying everything and saving the results to a local context directory:

printf 'FROM scratch\nCOPY . /' | DOCKER_BUILDKIT=1 docker build -f- -o context .

A small variation can show the path listing without saving the results to disk, but the actual copy is still done:

printf 'FROM scratch\nCOPY . /' | DOCKER_BUILDKIT=1 docker build -q -f- -o- . | tar t

aureliojargas avatar Jul 13 '23 00:07 aureliojargas