applicationset icon indicating copy to clipboard operation
applicationset copied to clipboard

feat: add exclude files when using the git file generator (#468)

Open adamjohnson01 opened this issue 2 years ago • 20 comments

This adds the ability to exclude specific paths when using the git file generator.

Closes #468

adamjohnson01 avatar Feb 20 '22 22:02 adamjohnson01

a/**/b.json and a/*/b.json will be identical in path.Match. That pattern would not be used currently because only the directory generator currently uses path.Match.

I think the only potential issues would occur if there were users that had ** in their directory path excludes because path.Match would treat that a single *. If you are concerned about this then we could have a two different filtering functions. One. that includes glob for files if there is a ** and another that just uses path.Match for directories.

adamjohnson01 avatar Feb 26 '22 22:02 adamjohnson01

@crenshaw-dev, I misunderstood what you meant about adding a new glob field. I think that would be more confusing. The only time glob will need to be used is when there is a ** and exclude is true. The ** works when generating the directory and file lists as under the covers is it just git ls-files and the path. The path.Match function does not translate ** the same way that the git ls-files does so the current setup is producing unexpected behaviour for anyone using ** in excludes. I think that adding the ** check could potentially break some existing config but the config that it 'breaks' is not working as it should anyway. Changing it will make the exclude path matching consistent with the file and directory path generation which will make troubleshooting issues with excludes a lot easier.

adamjohnson01 avatar Feb 27 '22 23:02 adamjohnson01

@adamjohnson01 thanks for the explanation! I don't think I'll be able to give this the review it deserves until Monday. Thanks for your patience!

crenshaw-dev avatar Mar 04 '22 14:03 crenshaw-dev

After thinking some more, I'm still not comfortable inferring pattern type.

  1. We can't be sure of the intent of someone who is currently using ** in a pattern. It's possibly a typo. But if we change the behavior, it could break currently-working AppSets. (For example, a/b/** would now return a/b/c/d/ along with a/b/c/.
  2. Glob has other behavior differences. For example, a pattern containing { or } will be interpreted differently. Someone currently using path/**/{with-literal-braces} will find that /path/a/{with-literal-braces} no longer matches. It's a far-edge case, but still possible.
  3. Someone who wants to use glob's pattern-list feature ({pattern,list}) will not be able to, unless their pattern also contains **. We'd have to broaden the feature detection, possibly breaking things for existing users.
  4. The glob library hasn't been maintained in a while, and there are weird quirks which could be very confusing for a user who changes nothing but * to ** and suddenly finds that other parts of the pattern are broken.

crenshaw-dev avatar Mar 07 '22 17:03 crenshaw-dev

@crenshaw-dev, those are valid points. How about using the doublestar library instead of glob. It implements support for double star ** matches for path.Match. The patterns are the same other than ** support.

adamjohnson01 avatar Mar 07 '22 20:03 adamjohnson01

I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.

I've been using these tests to compare behavior:

func TestGlobVsMatch(t *testing.T) {
	cases := []struct {
		name string
		testPath string
		pattern string
		expected bool
	}{
		{
			name: "double-star does not match separator",
			testPath: "/a/b/c",
			pattern: "/a/**",
			expected: true,
		},
		{
			name: "{ is a valid part of a path",
			testPath: "/a/{filename}",
			pattern: "/a/{filename}",
			expected: true,
		},
	}

	for _, testCase := range cases {
		testCaseCopy := testCase

		t.Run(testCaseCopy.name, func(t *testing.T) {
			t.Parallel()
			pathMatches, err := path.Match(testCaseCopy.pattern, testCaseCopy.testPath)
			assert.NoError(t, err)
			assert.Equal(t, testCaseCopy.expected, pathMatches, "path.Match returned %v instead of %v for pattern %v and path %v", pathMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
			doublestarMatches, err := doublestar.Match(testCaseCopy.pattern, testCaseCopy.testPath)
			assert.NoError(t, err)
			assert.Equal(t, testCaseCopy.expected, doublestarMatches, "doublestar.Match returned %v instead of %v for pattern %v and path %v", doublestarMatches, testCaseCopy.expected, testCaseCopy.pattern, testCaseCopy.testPath)
		})
	}
}

crenshaw-dev avatar Mar 07 '22 21:03 crenshaw-dev

@crenshaw-dev

I do like doublestar a lot more! Looks better-maintained. But I don't think it resolves the first three points.

I think that 1 is the only point that it does not solve but this is limited to excludes and not the path generation as that is handled by git ls-files. The other points are no longer valid as we will no longer be using glob, we will be using path.Match again.

adamjohnson01 avatar Mar 07 '22 21:03 adamjohnson01

this is limited to excludes and not the path generation as that is handled by git ls-files

Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using filepath.Walk to generate a list of files.

So the behavior of this generator would change after merging this PR:

  - git:
      repoURL: https://github.com/argoproj/applicationset.git
      revision: HEAD
      files:
      - path: "examples/git-generator-files-discovery/cluster-config/**/config.json"

Instead of matching only files one directory away from cluster-config, this would seem to now match config.json files in further-nested directories.

crenshaw-dev avatar Mar 08 '22 19:03 crenshaw-dev

@crenshaw-dev

this is limited to excludes and not the path generation as that is handled by git ls-files

Can you clarify what you mean here? I'm sure I'm missing something obvious. But as far as I can tell, file listing is currently performed by cloning the repo and using filepath.Walk to generate a list of files.

You are not missing anything. I am only half correct. filepath.Walk is used for directories generation and git ls-files is used for file generation.

So the behavior of this generator would change after merging this PR:

  - git:
      repoURL: https://github.com/argoproj/applicationset.git
      revision: HEAD
      files:
      - path: "examples/git-generator-files-discovery/cluster-config/**/config.json"

Instead of matching only files one directory away from cluster-config, this would seem to now match config.json files in further-nested directories.

This would not change as git ls-files supports ** already and we are only adding adding exclude functionality not changing generation functionality. The exclude is being implemented to be inline with the way generation works for files.

I think that it is probably best if we implement two different filtering functions. One for directories and one for files. That way we can leave the current directory filtering which works the same way as directory generation alone. That way we will not break any existing config.

The new exclude config for files should work the same way that the generation for files works so we will need to implement ** functionality.

What do you think?

adamjohnson01 avatar Mar 08 '22 21:03 adamjohnson01

Ah! Thanks for the clarification. Had to look at the code for a few minutes to understand what's happening. Yep, your idea of two different filter functions makes a lot of sense.

crenshaw-dev avatar Mar 09 '22 14:03 crenshaw-dev

@adamjohnson01 : Can u please move this PR into argocd as applicationset is moved. If u find it laborious, i can do it on your behalf with your consent

rishabh625 avatar Apr 04 '22 05:04 rishabh625

@rishabh625, I will give it a go

adamjohnson01 avatar Apr 09 '22 21:04 adamjohnson01

@adamjohnson01 let us know if you'd rather us move over the PR on your behalf. :-)

crenshaw-dev avatar Jun 06 '22 13:06 crenshaw-dev

any progress on this PR? When it will be merged?

yebolenko avatar Jun 19 '22 19:06 yebolenko

@yebolenko it's already merged and will be available in next release https://github.com/argoproj/argo-cd/pull/9150

rishabh625 avatar Jun 19 '22 19:06 rishabh625

@yebolenko it's already merged and will be available in next release argoproj/argo-cd#9150

AFAICS this merge is related to excluding entire repositories using the SCM generator, how would I use this to exclude only a single file using the Git file generator when generating all Applications from a single Git repository?

I have not had time to work on this but will try and get it moved across.

adamjohnson01 avatar Jul 06 '22 16:07 adamjohnson01

Has there been any progress on this? I would like to ignore certain directory patterns but keep the rest, which is impossible using only glob patterns for inclusion as it is now.

Hi folks,

Do we have some update about this? I would like to use exclude files when using git file generator.

vinicius-deoliveira avatar Jun 20 '23 14:06 vinicius-deoliveira

I have not seen this feature attempted on the new repo

crenshaw-dev avatar Jun 20 '23 16:06 crenshaw-dev