CFP: Cilium CLI connectivity tests speedup.
Proposal sounds good to me. As mentioned offline by Andre, it would be good to see a POC of how this would work. Namely, the aspect I have concerns about is that many of the connectivity tests configure the cluster in a specific way that may conflict with other test runs (such as policies, etc.). It would be good to understand how you propose to approach that problem.
Thank you, @christarazi !
Yeah, it should be challenging but I want to try it.
Do you know what else except Cluster wide network policies can interfere with different namespaces?
In general, any policy whether it's CNP or CCNP can interfere, especially if the workloads that they select are common amongst other policies. It seems like one approach could be to completely separate workloads via namespaces for each "group" of connectivity tests. This way the policies applied only have an effect within the namespaces that they are in, so therefore namespaces would be the separation barrier that allows parallelism.
It would be useful to people like me that the document includes a list, at a high level, of the strategies that have already been explored and why they are falling short.
Yeah, it would be useful for me as well.
I was thinking about PoC plan and realized that Cilium CLI can be used to run multiple tests in parallel (at least for testing).
The --test-namespace and --test parameters were used to validate the idea.
I've selected 46 tests (from EKS CNI conformance test workflow) and used the attached bash script to run the tests in batches/parallel.
You can find results in the table: https://docs.google.com/spreadsheets/d/1csmszEtlohqPpgMV8N_aJUI4yCoW8mPke46k-rG-Uec/edit?usp=sharing
Conclusions:
- Cilium CLI has no tests that use
CiliumClusterwideNetworkPolicyyet. - At least 46 selected tests can be run in parallel with no interference!!!
- Practically, it won't be possible to run each test in a separate namespace due to a lot of pods and IPs allocation.
- Ideally, CLI should create and verify all the required test namespaces/deployments only once initially.
Further steps (order might be different):
- Rename
--test-namespaceparameter to--test-namespace-prefix. - Implement a new parameter:
--test-parallel-runswith the default value:1. - Move test namespace/deployments creation and verification logic before the test run function.
- Implement tests grouping logic into batches with the
--test-parallel-runssize. - Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).
Also, I was thinking about implementing this as a new CLI command (e.g.: cilium tests ..., maybe even hidden).
So, for some time we can have both old and new approaches with shared test sources and will be able to test and compare them without any impact.
CC: @aanm @christarazi @fgiloux @brlbil @michi-covalent.
That sounds good to me.
Just one thing on
Cilium CLI has no tests that use CiliumClusterwideNetworkPolicy yet.
Soon I imagine https://github.com/cilium/design-cfps/pull/16 will get merged and we'll very likely have tests with CCNP, so it is something that we'll need to consider in this proposal.
Sounds great,
Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).
One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.
Sounds great,
Think about how to collect and display output (considering GH runners that might have different behavior than a local terminal).
One thing might be tricky, printing test logs correctly given the tests would be run concurrently. Also, the JUnit collection should be considered.
Thanks! I've already tested this: cilium/images/conn-tests-concurrent-output.gif
@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status
@viktor-kurchenko we just added statuses for CFPs. Where do you think this one currently falls? https://github.com/cilium/design-cfps#status
@xmulligan I think the status should be: Released cilium/cilium-cli 0.16