aperture
aperture copied to clipboard
Ensure correctness of OLAP telemetry for multi-extractor Classifiers
Describe the solution you'd like
- Double check whether
flowcontrolv1.Classifier
message is being populated properly in the CheckResponse for multi-label Rego rules in populateFlowLabels - Should
appendNewClassifier
move inside the for loop on variables so that individualLabelKeys
may be tracked. - Can an error be attributed to a particular
LabelKey
? If not, should change the Classifiers spec and the resulting OLAP telemetry.
Additional context
- The purpose of
flowcontrolv1.Classifier
is to piece together the Classifiers (PolicyName, ClassifierIndex, LabelKey) that matched and errors. This info is made available as attributes in OLAP telemetry.
Related: #534
After discussion with @DariaKunoichi – some more things to polish regarding classifier error-handling:
-
Categorize errors into different kinds and perhaps treat them slightly differently
- context-timeout – we should just early return, without any attempt to log, etc. Perhaps bump some stats counter?
- errors caused by "invalid input" (eg. tried to extract a header, which is missing)
- errors caused by problem with rego itself (not sure if we can differentiate it with b)
- "internal errors" like https://github.com/fluxninja/aperture/blob/8228d34912ddafcd9b3725dae814485a9189b271/pkg/policies/dataplane/resources/classifier/classifier.go#L109 – they signify a breakage of some internal invariants and is not caused neither by policy or traffic, users should report an issue.
Right now we treat all them the same way "Log and add to checkresponse", which is not ideal.
-
Double check if multi-extractor classifier can handle "partial map" – eg. some extractors succeeded but some failed. If it's not possible, it's kinda sad, as error in one extractor could basically "disable" other extractor.