applicationset
applicationset copied to clipboard
feat: add exclude files when using the git file generator (#468)
This adds the ability to exclude specific paths when using the git file generator.
Closes #468
a/**/b.json and a/*/b.json will be identical in path.Match. That pattern would not be used currently because only the directory generator currently uses path.Match.
I think the only potential issues would occur if there were users that had ** in their directory path excludes because path.Match would treat that a single *. If you are concerned about this then we could have a two different filtering functions. One. that includes glob for files if there is a ** and another that just uses path.Match for directories.
@crenshaw-dev, I misunderstood what you meant about adding a new glob field. I think that would be more confusing. The only time glob will need to be used is when there is a ** and exclude is true. The ** works when generating the directory and file lists as under the covers is it just git ls-files and the path. The path.Match function does not translate ** the same way that the git ls-files does so the current setup is producing unexpected behaviour for anyone using ** in excludes. I think that adding the ** check could potentially break some existing config but the config that it 'breaks' is not working as it should anyway. Changing it will make the exclude path matching consistent with the file and directory path generation which will make troubleshooting issues with excludes a lot easier.
@adamjohnson01 thanks for the explanation! I don't think I'll be able to give this the review it deserves until Monday. Thanks for your patience!
After thinking some more, I'm still not comfortable inferring pattern type.
- We can't be sure of the intent of someone who is currently using
**in a pattern. It's possibly a typo. But if we change the behavior, it could break currently-working AppSets. (For example,a/b/**would now returna/b/c/d/along witha/b/c/. - Glob has other behavior differences. For example, a pattern containing
{or}will be interpreted differently. Someone currently usingpath/**/{with-literal-braces}will find that/path/a/{with-literal-braces}no longer matches. It's a far-edge case, but still possible. - Someone who wants to use glob's pattern-list feature (
{pattern,list}) will not be able to, unless their pattern also contains**. We'd have to broaden the feature detection, possibly breaking things for existing users. - The glob library hasn't been maintained in a while, and there are weird quirks which could be very confusing for a user who changes nothing but
*to**and suddenly finds that other parts of the pattern are broken.
@crenshaw-dev, those are valid points. How about using the doublestar library instead of glob. It implements support for double star ** matches for path.Match. The patterns are the same other than ** support.
I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.
I've been using these tests to compare behavior:
func TestGlobVsMatch(t *testing.T) {
cases := []struct {
name string
testPath string
pattern string
expected bool
}{
{
name: "double-star does not match separator",
testPath: "/a/b/c",
pattern: "/a/**",
expected: true,
},
{
name: "{ is a valid part of a path",
testPath: "/a/{filename}",
pattern: "/a/{filename}",
expected: true,
},
}
for _, testCase := range cases {
testCaseCopy := testCase
t.Run(testCaseCopy.name, func(t *testing.T) {
t.Parallel()
pathMatches, err := path.Match(testCaseCopy.pattern, testCaseCopy.testPath)
assert.NoError(t, err)
assert.Equal(t, testCaseCopy.expected, pathMatches, "path.Match returned %v instead of %v for pattern %v and path %v", pathMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
doublestarMatches, err := doublestar.Match(testCaseCopy.pattern, testCaseCopy.testPath)
assert.NoError(t, err)
assert.Equal(t, testCaseCopy.expected, doublestarMatches, "doublestar.Match returned %v instead of %v for pattern %v and path %v", doublestarMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
})
}
}
@crenshaw-dev
I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.
I think that 1 is the only point that it does not solve but this is limited to excludes and not the path generation as that is handled by git ls-files. The other points are no longer valid as we will no longer be using glob, we will be using path.Match again.
this is limited to excludes and not the path generation as that is handled by git ls-files
Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using filepath.Walk to generate a list of files.
So the behavior of this generator would change after merging this PR:
- git:
repoURL: https://github.com/argoproj/applicationset.git
revision: HEAD
files:
- path: "examples/git-generator-files-discovery/cluster-config/**/config.json"
Instead of matching only files one directory away from cluster-config, this would seem to now match config.json files in further-nested directories.
@crenshaw-dev
this is limited to excludes and not the path generation as that is handled by git ls-files
Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using
filepath.Walkto generate a list of files.
You are not missing anything. I am only half correct. filepath.Walk is used for directories generation and git ls-files is used for file generation.
So the behavior of this generator would change after merging this PR:
- git: repoURL: https://github.com/argoproj/applicationset.git revision: HEAD files: - path: "examples/git-generator-files-discovery/cluster-config/**/config.json"Instead of matching only files one directory away from
cluster-config, this would seem to now match config.json files in further-nested directories.
This would not change as git ls-files supports ** already and we are only adding adding exclude functionality not changing generation functionality. The exclude is being implemented to be inline with the way generation works for files.
I think that it is probably best if we implement two different filtering functions. One for directories and one for files. That way we can leave the current directory filtering which works the same way as directory generation alone. That way we will not break any existing config.
The new exclude config for files should work the same way that the generation for files works so we will need to implement ** functionality.
What do you think?
Ah! Thanks for the clarification. Had to look at the code for a few minutes to understand what's happening. Yep, your idea of two different filter functions makes a lot of sense.
@adamjohnson01 : Can u please move this PR into argocd as applicationset is moved. If u find it laborious, i can do it on your behalf with your consent
@rishabh625, I will give it a go
@adamjohnson01 let us know if you'd rather us move over the PR on your behalf. :-)
any progress on this PR? When it will be merged?
@yebolenko it's already merged and will be available in next release https://github.com/argoproj/argo-cd/pull/9150
@yebolenko it's already merged and will be available in next release argoproj/argo-cd#9150
AFAICS this merge is related to excluding entire repositories using the SCM generator, how would I use this to exclude only a single file using the Git file generator when generating all Applications from a single Git repository?
I have not had time to work on this but will try and get it moved across.
Has there been any progress on this? I would like to ignore certain directory patterns but keep the rest, which is impossible using only glob patterns for inclusion as it is now.
Hi folks,
Do we have some update about this? I would like to use exclude files when using git file generator.
I have not seen this feature attempted on the new repo