envoy icon indicating copy to clipboard operation
envoy copied to clipboard

Add the sample() convenience function in CEL expression builder

Open wtzhang23 opened this issue 1 year ago • 30 comments

Commit Message: add random convenience function in CEL expression builder Additional Description: Add the random() function that allows CEL expressions to call the Random::RandomGenerator::random() function to generate a random 64-bit unsigned integer Risk Level: Medium Testing: unit tests Docs Changes: n/a Release Notes: n/a Platform Specific Features: n/a Fixes #34805 Fixes istio/istio#50404

wtzhang23 avatar Jul 14 '24 18:07 wtzhang23

Might be more convenient to pass the Bernoulli value as a float instead of providing the uint64 value.

wtzhang23 avatar Jul 14 '24 18:07 wtzhang23

Hmm, there was a bug where if constant folding is enabled, it will inline the function. Thinking of mitigations. Some things that can possibly be done:

  • Close this PR and simply provide the random number through an attribute
  • Change the function to be lazily evaluated (tried this initially but passing the Random::RandomGenerator at multiple locations became cumbersome. See my other branch)
  • Push a change upstream to allow marking functions to not be inlined Also, not sure if this function really aligns well with the cel-spec philosophy of functions being "side-effect free"

wtzhang23 avatar Jul 14 '24 22:07 wtzhang23

/retest

wtzhang23 avatar Jul 15 '24 01:07 wtzhang23

/assign @tyxia as codeowner / maintainer

KBaichoo avatar Jul 15 '24 12:07 KBaichoo

neither of as, codeowner, /, maintainer can be assigned to this issue.

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/35180#issuecomment-2228442920 was created by @KBaichoo.

see: more, trace.

/assign @tyxia

as codeowner / maintainer

KBaichoo avatar Jul 15 '24 12:07 KBaichoo

Hmm, there was a bug where if constant folding is enabled, it will inline the function. Thinking of mitigations. Some things that can possibly be done:

  • Close this PR and simply provide the random number through an attribute
  • Change the function to be lazily evaluated (tried this initially but passing the Random::RandomGenerator at multiple locations became cumbersome. See my other branch)
  • Push a change upstream to allow marking functions to not be inlined Also, not sure if this function really aligns well with the cel-spec philosophy of functions being "side-effect free"

I don't think we should just disable constant folding (which is performance optimization) just for this random() feature. If it conflicts here, then this is first thing of this PR that we want to discuss.

Add @kyessenov , Kuat's insights would be appreciated here.

tyxia avatar Jul 15 '24 19:07 tyxia

FWIW I do have another branch with lazy loading of functions that doesn't require disable constant folding.

wtzhang23 avatar Jul 15 '24 20:07 wtzhang23

I think either option (2) or (3) are reasonable. A new attribute only solves a subset of cases when you want the same consistent random value for all evaluations for a request.

@jcking might have a wider context on how randomness is added to CEL.

kyessenov avatar Jul 15 '24 23:07 kyessenov

It may still be useful to have (1), especially if it is based off of x-request-id in which case you can make a decision consistently across multiple envoy hops. Though, this is not related to this PR or the parent issue.

wtzhang23 avatar Jul 16 '24 15:07 wtzhang23

I'm looking into CEL solutions in other contexts - but I tend to agree that a Bernoulli value sample(p) that returns true with probability p might be easier to use. That solves some constant-folding issues but not entirely absolves non-deterministic evaluation. We can also make it integer-based using https://www.envoyproxy.io/docs/envoy/latest/api-v3/type/v3/percent.proto#envoy-v3-api-msg-type-v3-fractionalpercent, e.g. sample(p, 10000) where p is between 0 and 10000.

kyessenov avatar Jul 17 '24 22:07 kyessenov

I had an internal discussion with some CEL folks. We think a pure random() function is likely to cause issues in general that won't be solved by lazy functions. CEL expressions generally are pre-processed in multiple places, not just Envoy, and a compiler at any place in the pipeline is free to perform transformations to improve performance. CEL functions are pure and deterministic, so if you have multiple calls to random() they can be elided and/or combined in unexpected ways. So even if you make evaluation assume lazy function in Envoy, the control plane can still make a dangerous transformation.

What we think would work best is to turn this into a true pure function by adding parameters. One example would be a bool function sample(r, p, i) where:

  • r is an opaque attribute mapping to the random generator. User is expected to pass an attribute that is not known ahead of time, e.g. request.
  • p is a Bernoulli ratio. For simplicity we can make this a numerator over 10000.
  • i is a sampling count. This makes repeated sampling, e.g. in a for loop, explicit. If we only sample once per expression, we can elide this parameter.

kyessenov avatar Jul 18 '24 21:07 kyessenov

Ya, I was thinking a similar thing. To not be too computationally expensive, I was thinking under the hood, for each CEL expression, we generate one random number for each invocation of an expression (can be stored in the r attribute) that we use to seed some prng where we pass i. Edit: I ultimately lack the expertise in doing this securely; if I go ahead and try to implement this, I will need some assistance at auditing it.

wtzhang23 avatar Jul 18 '24 22:07 wtzhang23

Thinking of an implementation where we generate a random number as a seed before the execution of a CEL expression, apply a cryptographic hash with that seed concatenated with a index (e.g. sha256), then convert the hash into a probability. So sample(r, p, i) = (sha256(r.seed + i)[..2] / u64max) < p

wtzhang23 avatar Jul 20 '24 19:07 wtzhang23

As a user's point of view, I think this is a good proposal to implement percentage-based log sampling.

zirain avatar Jul 30 '24 02:07 zirain

/wait

soulxu avatar Aug 02 '24 07:08 soulxu

Probably some optimization should be considered? E.g.

  • [ ] Lazily generating a random number to serve as a seed to avoid generating a seed for every cel expression, even those that don't need it?
  • [ ] Keep a small buffer for a small range of indices to avoid using the SHA technique when not necessary?

wtzhang23 avatar Aug 05 '24 01:08 wtzhang23

/retest

wtzhang23 avatar Aug 05 '24 12:08 wtzhang23

/retest

wtzhang23 avatar Aug 09 '24 01:08 wtzhang23

Please merge main to fix conflicts. /wait

adisuissa avatar Aug 13 '24 12:08 adisuissa

/retest

wtzhang23 avatar Aug 20 '24 18:08 wtzhang23

@wtzhang23 coverage fail looks real

phlax avatar Sep 03 '24 16:09 phlax

Yep, need to get the coverage report locally to check. Unless it is already uploaded and I couldn't find the artifact.

wtzhang23 avatar Sep 03 '24 16:09 wtzhang23

/coverage

phlax avatar Sep 03 '24 16:09 phlax

Coverage for this Pull Request will be rendered here:

https://storage.googleapis.com/envoy-pr/35180/coverage/index.html

The coverage results are (re-)rendered each time the CI envoy-presubmit (check linux_x64 coverage) job completes.

:cat:

Caused by: a https://github.com/envoyproxy/envoy/pull/35180#issuecomment-2327001724 was created by @phlax.

see: more, trace.

Bump on this? I no longer need this feature and I'd like to get this off my hands. I'm ok with closing this if it is not a good approach (I'm slightly uncomfortable with it to be honest, though I don't know of another way to conform to the CEL spec).

wtzhang23 avatar Sep 07 '24 19:09 wtzhang23

Sorry about the delay. I'm not confident this is the right approach TBH. I would be more comfortable if this was an extension, but we haven't added an ability to extend CEL vocabulary. I am worried that adding this without a concrete use case would hurt us long term, especially since you added functions to the top-level request attribute. But I also don't want this work go to waste. Should we first work out a custom vocabulary in CEL and then provide this as a custom extensions instead?

kyessenov avatar Sep 12 '24 00:09 kyessenov

Yes, I agree with your assessment.

wtzhang23 avatar Sep 12 '24 00:09 wtzhang23

/wait

Need some way to make this opt-in.

kyessenov avatar Sep 16 '24 23:09 kyessenov

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Oct 17 '24 00:10 github-actions[bot]