traefik-helm-chart icon indicating copy to clipboard operation
traefik-helm-chart copied to clipboard

feat: topology spread constraints

Open faust64 opened this issue 4 years ago • 14 comments
trafficstars

What does this PR do?

Adds optional topologySpreadConstraints.

Motivation

As requested by @abatilo , in in https://github.com/traefik/traefik-helm-chart/issues/465

More

Additional Notes

N/A

faust64 avatar Aug 04 '21 18:08 faust64

Thank you for doing this!

abatilo avatar Aug 12 '21 16:08 abatilo

hello @faust64

Again, thank you a lot for contributing to the project.

Do you think that we should add a semvercompare to warn users who are going to set it on a cluster before 1.18?

jakubhajek avatar Aug 18 '21 13:08 jakubhajek

Sure, I'll look into it.

faust64 avatar Aug 18 '21 14:08 faust64

Looking into this, I'm confused I can't get Capabilities based checks working as expected.

I've been hitting with errors such as can't evaluate field Version in type *version.Info.

Eventually, I switched from rancher helm-unittest, to that of quintush, where they did implement a feature letting you set Capabilities while running tests: https://github.com/quintush/helm-unittest/issues/36

$ helm unittest --color --update-snapshot ./traefik

### Chart [ traefik ] ./traefik

 PASS  Main Container configuration	traefik/tests/container-config_test.yaml
 PASS  DaemonSet configuration	traefik/tests/daemonset-config_test.yaml
 PASS  default install	traefik/tests/default-install_test.yaml
 PASS  Deployment configuration	traefik/tests/deployment-config_test.yaml
 PASS  Gateway configuration	traefik/tests/gateway-config_test.yaml
 PASS  Gatewayclass configuration	traefik/tests/gatewayclass-config_test.yaml
 PASS  Pod configuration	traefik/tests/pod-config_test.yaml
 PASS  PodDisruptionBudget configuration	traefik/tests/poddisruptionbudget-config_test.yaml
 PASS  PodSecurityPolicy configuration	traefik/tests/podsecuritypolicy-config_test.yaml
 PASS  Traefik configuration	traefik/tests/ports-config_test.yaml
 PASS  RBAC configuration	traefik/tests/rbac-config_test.yaml
 PASS  Service configuration	traefik/tests/service-config_test.yaml
 PASS  Traefik configuration	traefik/tests/traefik-config_test.yaml

Charts:      1 passed, 1 total
Test Suites: 13 passed, 13 total
Tests:       90 passed, 90 total
Snapshot:    0 passed, 0 total
Time:        310.122105ms

That switch implies changes to Makefile and traefik/Chart.yaml:

diff --git a/Makefile b/Makefile
index 474f56e..4094913 100644
--- a/Makefile
+++ b/Makefile
@@ -92,7 +92,7 @@ endif
 
 helm-unittest: global-requirements
        @echo "== Checking that plugin helm-unittest is available..."
-       @helm plugin list 2>/dev/null | grep unittest >/dev/null || helm plugin install https://github.com/rancher/helm-unittest --debug
+       @helm plugin list 2>/dev/null | grep unittest >/dev/null || helm plugin install https://github.com/quintush/helm-unittest --debug
        @echo "== plugin helm-unittest is ready"
 
 .PHONY: all global-requirements lint-requirements helm-unittest lint build deploy clean
diff --git a/traefik/Chart.yaml b/traefik/Chart.yaml
index 6772a6b..17d4ebf 100644
--- a/traefik/Chart.yaml
+++ b/traefik/Chart.yaml
@@ -1,4 +1,4 @@
-apiVersion: v2
+apiVersion: v1
 name: traefik
 description: A Traefik based Kubernetes ingress controller
 type: application

Somehow, that version of the unittest plugin doesn't recognize the v2 apiVersion in chart, wants me to set it back to v1 ...

== Unit Testing Chart...

### Error:  apiVersion 'v2' is not valid. The value must be "v1"

Yet, when I run all the tests, linting doesn't like that v1, and asks for a v2.

==> Linting traefik
[ERROR] Chart.yaml: chart type is not valid in apiVersion 'v1'. It is valid in apiVersion 'v2'

Then, we can also fix the Makefile unit-test rule, adding some --helm3 option:

  @helm unittest --color --update-snapshot --helm3 ./traefik

Though this would probably require even more patches, regarding Travis.

I'm lost... I should probably get back to the previous unittest plugin. Though that would most likely end up without my testing those TopologySpreadConstraints were actually set.

I'll give it another look, later.

faust64 avatar Aug 19 '21 10:08 faust64

I didn't have much time to look at this. The unittest plugin from quintush may be the way to go, although I've had to "fix" tests to get them working.

You would notice in the current diffs, something like this:

--- a/traefik/tests/service-config_test.yaml
+++ b/traefik/tests/service-config_test.yaml
@@ -48,6 +48,8 @@ tests:
   - it: should have UDP only annotations when specified via values
     set:
       service:
+        annotations:
+          azure-load-balancer-internal: true
         annotationsUDP:
           dns-hostname: udp.example.com
       ports:
@@ -57,8 +59,10 @@ tests:
           exposedPort: 80
           protocol: UDP
     asserts:
-      - isNull:
+      - equal:
           path: items[0].metadata.annotations
+          value:
+            azure-load-balancer-internal: true
       - equal:
           path: items[1].metadata.annotations.dns-hostname
           value: udp.example.com

But... Travis is happy with this... At least, I'm sure switching over won't change anything specific to Travis.

Still, this should not be merged in that state. The original test would fail with the following message:

	- should have UDP only annotations when specified via values

		- asserts[0] `isNull` fail
			Template:	traefik/templates/service.yaml
			DocumentIndex:	0
			Path:	items[0].metadata.annotations
			Expected to be null, got:
				azure-load-balancer-internal: true

As far as I understand, there is no reason the copy of our Services evaluated in that test would actually include that annotation. Definitely sounds like a bug, in quintush's unit test plugin. Which would be a good reason to stick with rancher's copy... Though abandoning the idea of getting tests to check for kube version, using .Capabilities.KubeVersion

faust64 avatar Sep 04 '21 23:09 faust64

This would also be super useful to me. I rebased #371 to master in my fork and bumped the dependency on helm-unittest to 0.1.7-rancher3. This should support setting the KubeVersion in tests but I couldn't get it to work either:

Error: render error in "traefik/templates/deployment.yaml": template: traefik/templates/_podtemplate.tpl:30:56: executing "traefik.podTemplate" at <.Capabilities.KubeVersion.Version>: can't evaluate field Version in type *version.Info.

Any ideas?

CrawX avatar Sep 22 '21 11:09 CrawX

Rancher's helm-untt plugin just does not offer with that capability.

Remove and re-install that plugin, as I did there: https://github.com/traefik/traefik-helm-chart/pull/466/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52

faust64 avatar Sep 22 '21 21:09 faust64

Personally, I wouldn't mind not testing the kube version capability test if this causes so many problems with helm-unittest. What can we do to get this merged in?

CrawX avatar Oct 21 '21 09:10 CrawX

What are we waiting on for this PR at this point?

abatilo avatar Dec 04 '21 21:12 abatilo

@abatilo : revert tests, merge master in, bump chart version, re-test. I'm a bit overwhelmed right now ... I'll try to give it a look tomorrow.

faust64 avatar Dec 04 '21 22:12 faust64

Thank you for the work @faust64

abatilo avatar Dec 07 '21 16:12 abatilo

Can we have someone to review this PR ?

wigeric avatar Feb 07 '22 09:02 wigeric

I think the following is more useful as an example for people to build off of:

topologySpreadConstraints: []
# # This example topologySpreadConstraints will cause the scheduler to attempt to spread
# # traefik pods evenly through available topology zones (on public cloud deployments these
# # correspond to availability zones). 
#   - labelSelector:
#       matchExpressions:
#         - key: app.kubernetes.io/name
#           operator: In
#           values:
#             - {{ template "traefik.name" . }}
#     maxSkew: 1
#     topologyKey: topology.kubernetes.io/zone
#     whenUnsatisfiable: ScheduleAnyway

IMO podAntiAffinity is fine to prevent duplicate Traefik pods from being scheduled to a node, whereas topologySpreadConstraints is more useful to ensure that Traefik pods get evenly spread over available availability zones.

Other than that you should probably rebase on master and then squash all of your commits.

ReillyBrogan avatar May 27 '22 20:05 ReillyBrogan

Anything I can do to help push this forward?

abatilo avatar Sep 13 '22 15:09 abatilo

Hello,

I've tried to rebase this PR, in order to merge it, but it seems I cannot push to this PR.

@faust64 How do you want to proceed ? Should I open a new one or do you have a better option ?

mloiseleur avatar Oct 11 '22 08:10 mloiseleur

not sure what's wrong, but I kind of remember my filing PR from Worteks organization being an issue in the past, ... sure @mloiseleur , proceed however you can thanks for following up

faust64 avatar Oct 12 '22 16:10 faust64

@faust64 I was able to cherry-pick the main commit. You can review an up-to-date version on #663. I added templating capability inside the values file.

This PR is now superseded by #663

mloiseleur avatar Oct 13 '22 12:10 mloiseleur