envoy icon indicating copy to clipboard operation
envoy copied to clipboard

rbac: add delay_deny implementation in RBAC network filter

Open yangminzhu opened this issue 9 months ago • 5 comments

Commit Message: If specified, the RBAC network filter will delay the specified duration before actually closing the connection. Additional Description: This implements https://github.com/envoyproxy/envoy/issues/33771. Risk Level: Low Testing: Added Unit Test and Integration Test Manual Local Test Log:

[2024-04-30 11:03:14.080][22663692][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:95] checking connection: requestedServerName: , sourceIP: 127.0.0.1:54006, directRemoteIP: 127.0.0.1:54006,remoteIP: 127.0.0.1:54006, localAddress: 127.0.0.1:10000, ssl: none, dynamicMetadata:
[2024-04-30 11:03:14.080][22663692][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:201] enforced denied, matched policy none
[2024-04-30 11:03:14.080][22663692][debug][rbac] [source/extensions/filters/network/rbac/rbac_filter.cc:129] connection will be delay denied in 1000ms

Docs Changes: N/A Release Notes: Updated version history Platform Specific Features: N/A [Optional Runtime guard:] [Optional Fixes #Issue] [Optional Fixes commit #PR or SHA] [Optional Deprecated:] [Optional API Considerations:]

yangminzhu avatar Apr 30 '24 18:04 yangminzhu

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/). envoyproxy/api-shepherds assignee is @wbpcode CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

:cat:

Caused by: https://github.com/envoyproxy/envoy/pull/33875 was opened by yangminzhu.

see: more, trace.

cc @kyessenov , @lambdai

yangminzhu avatar Apr 30 '24 18:04 yangminzhu

@wbpcode , @lambdai could you take another look if the api looks good? also wondering who should be reviewing the actual code changes? thanks.

yangminzhu avatar May 02 '24 18:05 yangminzhu

@yangminzhu could you merge the main and resolve the conflict? Thanks.

wbpcode avatar May 07 '24 06:05 wbpcode

assign this to @yanavlasov for final review and because @yanavlasov is the code owner (and senior maintainer)

wbpcode avatar May 07 '24 06:05 wbpcode

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions[bot] avatar Jun 13 '24 04:06 github-actions[bot]

@yanavlasov , @wbpcode could you take another look? sorry about the delay, I was out and just got back. thank you.

yangminzhu avatar Jun 19 '24 23:06 yangminzhu

/wait

Seems like an RBAC test relevant to this change is timing out in CI @yangminzhu

KBaichoo avatar Jul 15 '24 13:07 KBaichoo

@KBaichoo yeah, but I think per @yanavlasov 's comment (https://github.com/envoyproxy/envoy/pull/33875#discussion_r1671166354), it is working on local workstataion but not in CI.

yangminzhu avatar Jul 18 '24 23:07 yangminzhu

This work for me on my workstation. I've opened #35128 to see what may be different.

@yanavlasov oh I'm a bit surprised it worked on your workstation (the test just always timeout in my local test). Also do you think if we can get this PR in without using the simulated timer (or we can get back to this once we found the root cause in https://github.com/envoyproxy/envoy/pull/35128 as this PR has been out for very long time)? the new partialWrite test should still provide pretty good coverage in the test. thanks.

yangminzhu avatar Jul 19 '24 23:07 yangminzhu

It worked both on workstation and CI. See #35128 Can you copy the test from my PR into yours and see if it will work? I would also remove the test with partialWrite It assumes a lot of things, such that Envoy can absorb such large write quickly. This looks to me like a recipe for a flaky test, especially in sanitized builds. And the test does not seem to validate that disconnect happens after 1 second, only that it happens at some point. Do you know for sure that it happened because of delayed disconnect, or it happened immediately and just it happens that loopback socket had large enough buffer to absorb 500K write? Or disconnected after 5 seconds because of inactivity timer?

/wait-any

yanavlasov avatar Jul 23 '24 00:07 yanavlasov

@yanavlasov ah thanks, updated the test and it works now, also removed the partialWrite in the test since now the simulated testing is working and should be good to cover the use cases now.

yangminzhu avatar Aug 06 '24 01:08 yangminzhu

@yangminzhu it looks like main merge is needed to resolve the coverage error.

yanavlasov avatar Aug 08 '24 00:08 yanavlasov

/retest

yangminzhu avatar Aug 08 '24 05:08 yangminzhu