design-cfps icon indicating copy to clipboard operation
design-cfps copied to clipboard

CFP: Cilium CLI connectivity tests speedup.

Open viktor-kurchenko opened this issue 1 year ago • 7 comments

viktor-kurchenko avatar Jan 24 '24 08:01 viktor-kurchenko

Proposal sounds good to me. As mentioned offline by Andre, it would be good to see a POC of how this would work. Namely, the aspect I have concerns about is that many of the connectivity tests configure the cluster in a specific way that may conflict with other test runs (such as policies, etc.). It would be good to understand how you propose to approach that problem.

Thank you, @christarazi !

Yeah, it should be challenging but I want to try it. Do you know what else except Cluster wide network policies can interfere with different namespaces?

viktor-kurchenko avatar Jan 25 '24 09:01 viktor-kurchenko

In general, any policy whether it's CNP or CCNP can interfere, especially if the workloads that they select are common amongst other policies. It seems like one approach could be to completely separate workloads via namespaces for each "group" of connectivity tests. This way the policies applied only have an effect within the namespaces that they are in, so therefore namespaces would be the separation barrier that allows parallelism.

christarazi avatar Jan 26 '24 00:01 christarazi

It would be useful to people like me that the document includes a list, at a high level, of the strategies that have already been explored and why they are falling short.

Yeah, it would be useful for me as well.

viktor-kurchenko avatar Jan 29 '24 14:01 viktor-kurchenko

I was thinking about PoC plan and realized that Cilium CLI can be used to run multiple tests in parallel (at least for testing). The --test-namespace and --test parameters were used to validate the idea.

I've selected 46 tests (from EKS CNI conformance test workflow) and used the attached bash script to run the tests in batches/parallel.

You can find results in the table: https://docs.google.com/spreadsheets/d/1csmszEtlohqPpgMV8N_aJUI4yCoW8mPke46k-rG-Uec/edit?usp=sharing

Conclusions:

  • Cilium CLI has no tests that use CiliumClusterwideNetworkPolicy yet.
  • At least 46 selected tests can be run in parallel with no interference!!!
  • Practically, it won't be possible to run each test in a separate namespace due to a lot of pods and IPs allocation.
  • Ideally, CLI should create and verify all the required test namespaces/deployments only once initially.

Further steps (order might be different):

  1. Rename --test-namespace parameter to --test-namespace-prefix.
  2. Implement a new parameter: --test-parallel-runs with the default value: 1.
  3. Move test namespace/deployments creation and verification logic before the test run function.
  4. Implement tests grouping logic into batches with the --test-parallel-runs size.
  5. Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

Also, I was thinking about implementing this as a new CLI command (e.g.: cilium tests ..., maybe even hidden). So, for some time we can have both old and new approaches with shared test sources and will be able to test and compare them without any impact.

CC: @aanm @christarazi @fgiloux @brlbil @michi-covalent.

viktor-kurchenko avatar Feb 02 '24 18:02 viktor-kurchenko

That sounds good to me.

Just one thing on

Cilium CLI has no tests that use CiliumClusterwideNetworkPolicy yet.

Soon I imagine https://github.com/cilium/design-cfps/pull/16 will get merged and we'll very likely have tests with CCNP, so it is something that we'll need to consider in this proposal.

christarazi avatar Feb 03 '24 00:02 christarazi

Sounds great,

Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.

brlbil avatar Feb 03 '24 07:02 brlbil

Sounds great,

Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).

One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.

Thanks! I've already tested this: cilium/images/conn-tests-concurrent-output.gif

viktor-kurchenko avatar Feb 03 '24 10:02 viktor-kurchenko

@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status

xmulligan avatar Aug 09 '24 09:08 xmulligan

@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status

@xmulligan I think the status should be: Released cilium/cilium-cli 0.16

viktor-kurchenko avatar Aug 09 '24 19:08 viktor-kurchenko