agent
agent copied to clipboard
Directory artifact paths
Closes #1570
Previously, only glob patterns were allowed in artifact_paths. The name was confusing and also made certain patterns impossible. In particular, in a common bash script, it is impossible to unambiguously "unglob" a string. For instance, in a shell script with $BUILDKITE_ARTIFACT_PATHS=foo/**/* and $PWD in an empty directory, the following would create a directory named literally foo/**/* (assuming no globbing shell options were set):
mkdir $BUILDKITE_ARTIFACT_PATHS
Even if you were to try one of the globbing related shell options, there are none that would accomplish the desired effect.
Of course, the workaround would be to "just" mkdir foo in the shell script. However this breaks the decoupling between buildkite and the script as the buildkite configuration must "know" about the shell script and vice versa.
This change also makes new things possible.
One example is the following in a shell script command step:
make prefix=$BUILDKITE_ARTIFACT_PATHS
Which is a common idiom in many build systems.
After some more looking at this, I think it is slightly more complicated than this (but not much).
Ok I think this is a start now. I see that maybe in the past filepath.Walk might have been restrictively slow but now filepath.WalkDir exists 😄 .
A brief test on our CI confirms this does what I want. Though the symlinks need some work and there could be cases of which I am unaware.
Ok tests pass and I think are actually correct (and symlinks are handled).
Hi @jsoo1! This is an interesting suggestion, and thank you for the contribution, but BUILDKITE_ARTIFACT_PATHS is intended to be a list of globs for artifact upload, not used as an input to shell scripts for commands like mkdir.
If you want a shared value for commands and for artifacts then perhaps you want to abstract that. I would suggest doing something like this instead, for example:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}/bin/*
command: make PREFIX="${PREFIX}"
Hi @sj26 thank you for your feedback!
BUILDKITE_ARTIFACT_PATHSis intended to be a list of globs for artifact upload, not used as an input to shell scripts for commands likemkdir.
Hm. Why is that? I would assume that almost any env var in a shell would be subject to some use by a command.
If you want a shared value for commands and for artifacts then perhaps you want to abstract that. I would suggest doing something like this instead, for example:
steps: - env: - PREFIX=the/place/to/build artifact_paths: - ${PREFIX}/bin/* command: make PREFIX="${PREFIX}"
I can see this being useful, thank you!
I would like to push back on this decision a bit, though.
While I could probably define my own env var for the purpose of $prefix, my bigger papercuts are listed in #1570. To recap:
My problems are with glob semantics and violation of the principle of least surprise.
$BUILDKITE_ARTIFACT_PATHSis a misnomer. I get confused when I include a directory in the list (which is a path, as the name suggests) and it gets ignored.- Suppose it were renamed to
$BUILDKITE_ARTIFACT_PATTERNSand documented to accept only glob patterns. A directory is still a valid glob pattern!
Thus my expectations are doubly violated when a directory I list in $BUILDKITE_ARTIFACT_PATHS is explicitly ignored for upload.
If we modify the example slightly:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}/bin
command: make PREFIX="${PREFIX}"
Then no contents of ${PREFIX}/bin will be uploaded despite existing and being a valid glob.
The primary benefit of supporting directories would be to increase the usability of pipeline definitions. The use of $prefix is a potential secondary upside of supporting directory listings.
Otherwise I'm not sure I see the downsides. Is there extra maintenance that would be required due to this change I am not aware of?
In any case, we love Buildkite and use it heavily. Thank you for your work!
The agent puts all the parameters required to run the job into the environment and then invokes a bootstrap process which consumes a lot of those variables to run hooks etc around your command to provide the surrounding behaviour. $BUILDKITE_ARTIFACT_PATHS is one such env, and is consumed by the buildkite-agent artifact upload command during arifact upload after the command hook.
You can upload the binaries produced by make with something like this:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths: ${PREFIX}/bin/*
command: make PREFIX="${PREFIX}"
or you could be explicit and handle the upload yourself if you want more control:
steps:
- env:
- PREFIX=the/place/to/build
command: |
make PREFIX="${PREFIX}"
buildkite-agent artifact upload "${PREFIX}/bin/*"
I reckon we'll add a recursive mode for artifact upload at some point. Those commands need a bit of a rethink for better ergonomics. Maybe something like:
steps:
- env:
- PREFIX=the/place/to/build
command: |
make PREFIX="${PREFIX}"
# Doesn't work right now, but might at some point in the future
buildkite-agent artifact upload -r "${PREFIX}/bin"
@sj26
The agent puts all the parameters required to run the job into the environment and then invokes a bootstrap process which consumes a lot of those variables to run hooks etc around your command to provide the surrounding behaviour.
$BUILDKITE_ARTIFACT_PATHSis one such env, and is consumed by thebuildkite-agent artifact uploadcommand during arifact upload after the command hook.
Ok... can you expand a bit on how that relates to this conversation? Is that an additional maintenance burden? Sorry I'm having trouble seeing the relation.
You can upload the binaries produced by make with something like this:
steps: - env: - PREFIX=the/place/to/build artifact_paths: ${PREFIX}/bin/* command: make PREFIX="${PREFIX}"
... but importantly not:
artifact_paths: ${PREFIX}/bin
I reckon we'll add a recursive mode for artifact upload at some point. Those commands need a bit of a rethink for better ergonomics.
That's what this pr does, no?
The artifact uploader is used here:
https://github.com/buildkite/agent/blob/c7abbcee8377cfe829de7c3b9b0ff5c70a917c85/clicommand/artifact_upload.go#L149
And this this pr takes care of the hypothetical --recursive flag for you:
- transparently
- requiring no extra effort on your part (except some maintenance burden?)
- improving the ergonomics
- reducing surprise factor
- lining the artifacts up with glob semantics
I'm still not seeing the downsides (though I'm open to hearing them!).
Note the current state of affairs actually thwarts the glob matching by explicitly ignoring directories! See here: https://github.com/buildkite/agent/blob/c7abbcee8377cfe829de7c3b9b0ff5c70a917c85/agent/artifact_uploader.go#L145
Maybe what would help is some background on why that was done, because it is so confusing! My best guess is the previous performance downsides of filepath.Walk (I think it is called, it's been some time since I've looked into it). But the std lib now has filepath.WalkDir which solves the stat-storm of Walk: https://cs.opensource.google/go/go/+/go1.20.1:src/path/filepath/path.go;l=530.
Thanks again.
Perhaps I've been a little too polite here. Not allowing directories in the artifact_paths list is a major usability and correctness bug. Let's use your example to motivate why. Suppose our default make target begins with a single binary output. We start with the example pipeline definition:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}/bin/*
command: make PREFIX="${PREFIX}"
Then one day we decide to add a lib output. Now we would have to go change the pipeline to make sure the lib directory is uploaded:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}/bin/*
+ - ${PREFIX}/lib/*
command: make PREFIX="${PREFIX}"
But wait! This can be simplified!
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- - ${PREFIX}/bin/*
- - ${PREFIX}/lib/*
+ - ${PREFIX}/*
command: make PREFIX="${PREFIX}"
... woops, now our job has this in the logs:
Skipping directory bin
Skipping directory lib
*grumble* well ok, I guess we can do this?
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- - ${PREFIX}/*
+ - ${PREFIX}/**/*
command: make PREFIX="${PREFIX}"
Sweet! We can even add libexec and share directories, no problem!
Well, suppose then we want to add a README.md to the root of ${PREFIX}, then we also have to do this:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}/**/*
+ - ${PREFIX}/*
command: make PREFIX="${PREFIX}"
But we still have these annoying Skipping directory bin... logs. Plus my pipeline definition relies on the implementation of the Makefile and vice versa. The intermediate ${PREFIX} variable really isn't providing any "abstraction" here. The makefile and the pipeline.yaml are still coupled. What I would really like is:
steps:
- env:
- PREFIX=the/place/to/build
artifact_paths:
- ${PREFIX}
command: make PREFIX="${PREFIX}"
Which, as I noted in the original description, can be shorted to this (in the simplest cases):
steps:
artifact_paths:
- the/place/to/build
command: make PREFIX="${BUILDKITE_ARTIFACT_PATHS}"
Which is what this PR enables.