capsule-render icon indicating copy to clipboard operation
capsule-render copied to clipboard

a declarative, expanded conformance suite for networking and storage

Open jayunit100 opened this issue 2 years ago • 6 comments

Describe the solution you'd like

  • interesting that are only 6/300+ tests in the k8s conformance suite which use the kube-proxy,
  • but 23 or so sig-network tests use the kube-proxy. conformance isn't enough .

I think we're going to need to expand the definition of k8s conformance to "conformance++". id suggest adding

  • networkpolicy
  • sig-network
  • conformance
  • statefulset
  • basic pvc functionality

as an expanded sonobuoy run --expanded-coverage test plugin thats embedded in tree.

Anything else you would like to add:

I've written some of the subtleties of why this is important here https://jayunit100.blogspot.com/2020/08/sig-network-diagnostic-tests-that-fail.html, but in general, it's pretty easy to pass the conformance tests on a cluster which is severely missing functionality i.e. services arent working

b/c of the statefulness of the kube proxy after it dies.

jayunit100 avatar Nov 13 '21 20:11 jayunit100

That's clearly doable; all you're suggesting is to add another mode option so we'd have quick, conformance, certified-conformance, expanded-conformance

That bit of coverage is more important to you working in the networking space; I'm assuming that people working in different areas would have a slightly different set.

I think it may be a neat idea to not only add this but maybe others and surface them better. I'm thinking something along the lines of:

$ sonobuoy run -h 
...
...
-m, --mode.    Configure the e2e plugin based on these pre-selected options. See 'sonobuoy modes' for option details
...

$ sonobuoy modes -h
quick:  Runs only a single e2e test. Effective as a smoke test for a new cluster.
    E2E_FOCUS=<....>
    E2E_SKIP=<....>
certified-conformance: Runs all tests with the Conformance tag. Required for CNCF certification.
    E2E_FOCUS=<....>
    E2E_SKIP=<....>
...
...
...

mode used to also set the plugins but that isn't the case anymore so it really is just a convenience wrapper for setting the e2e args.

johnSchnake avatar Nov 17 '21 19:11 johnSchnake

xref #1560 , just starts from the understanding that it will be a new --mode

johnSchnake avatar Dec 29 '21 21:12 johnSchnake

@jayunit100 I have the sonobuoy modes command all set. If you can give me an exact E2E_FOCUS, E2E_SKIP and whether or not they can be run in parallel or not (do they contain any tests marked as serial?) then I can add it.

We can slightly tweak it as time goes on since its non-official, but for instance, I dont know how to translate basic pvc functionality into a focus/skip value.

johnSchnake avatar Dec 30 '21 23:12 johnSchnake

Going to get a rough PR for this up so that after just a tad of feedback it is ready for merging. Expect that tonight or tomorrow.

johnSchnake avatar Jan 03 '22 22:01 johnSchnake

Each of the pvc functionality tests is specific to a specific driver which would be very cluster specific. Thats just based on my brief look at it; is there some amount of functionality that is expected to become part of conformance?

In addition, could you clarify any of the others? For instance I saw Feature:NetworkPolicy but see that it overlaps with other features too (as in "[sig-network] Netpol [Feature:SCTPConnectivity][LinuxOnly][Disruptive] NetworkPolicy between server and client using SCTP should enforce policy based on Ports [Feature:NetworkPolicy]"). In that case I'd skip it because its disruptive but also it hits Feature:SCTPConnectivity, is that a default choice for most clusters now?

With an expansion of modes we dont have to be 100% generic but I dont have the expertise to really understand how generic some of those features are in recent clusters.

I'll put up the PR so you can comment there though.

johnSchnake avatar Jan 04 '22 03:01 johnSchnake

There has not been much activity here. We'll be closing this issue if there are no follow-ups within 15 days.

stale[bot] avatar Jul 07 '22 00:07 stale[bot]

For the CNI/Service networking side of things, i like 'Network Test Mode' as a starting point.

On the CSI side, i think there are still quite a few conformance cases.... Id like to keep this issue open if possible. I think we can probably convince our sig-storage friends to get involved, especially when it comes to comparing features across CSI providers, this will be super useful.

For example, alot of folks ask us from time to time things like

  • If IAM is not correct, what (if any) EKS CSI actions work (https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html)
  • Does my storage provider work with "large" files (i.e. postgress read aheads... such as, this issue https://github.com/gluster/glusterfs/issues/2056 )
  • Does CSI provider "X" support driver:, i.e. https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volumes
  • And so on.

I find it tricky in generall to be able to determine which of the "sig-storage" tests Should/Shouldn't be working on a cluster when i have a suspicion that theres a CSI issue.

And, since CSI volumes take a little time to create sometimes, this problem gets compounded pretty quickly when running diagnostics.

jayunit100 avatar Oct 17 '22 18:10 jayunit100

There has not been much activity here. We'll be closing this issue if there are no follow-ups within 15 days.

stale[bot] avatar Apr 25 '23 21:04 stale[bot]