envoy icon indicating copy to clipboard operation
envoy copied to clipboard

ext_proc: gcc test flakes: IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing

Open jmarantz opened this issue 1 year ago • 7 comments

This flaked for me in a CI: https://dev.azure.com/cncf/envoy/_build/results?buildId=165561&view=logs&j=8bf29878-a4cc-50f7-4e84-2255e6fd4065&t=150a3762-112b-5713-0d28-af1eb6c2fbf4

2024-03-20T19:10:36.7571536Z [ RUN      ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing
2024-03-20T19:10:36.7572847Z test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc:252: Failure
2024-03-20T19:10:36.7573591Z Value of: response.waitForEndStream()
2024-03-20T19:10:36.7574178Z   Actual: false (Timed out waiting for end stream
2024-03-20T19:10:36.7574832Z )
2024-03-20T19:10:36.7575165Z Expected: true
2024-03-20T19:10:36.7575524Z Stack trace:
2024-03-20T19:10:36.7576086Z   0x5594d9350370: Envoy::ExtProcIntegrationTest::verifyDownstreamResponse()
2024-03-20T19:10:36.7577124Z   0x5594d92c450e: Envoy::ExtProcIntegrationTest_GetAndSetTrailersIncorrectlyOnResponse_Test::TestBody()
2024-03-20T19:10:36.7578166Z   0x5594dd083bed: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
2024-03-20T19:10:36.7579047Z   0x5594dd07e89d: testing::internal::HandleExceptionsInMethodIfSupported<>()
2024-03-20T19:10:36.7579736Z   0x5594dd065128: testing::Test::Run()
2024-03-20T19:10:36.7580250Z   0x5594dd065b4b: testing::TestInfo::Run()
2024-03-20T19:10:36.7580762Z ... Google Test internal frames ...
2024-03-20T19:10:36.7581046Z 
2024-03-20T19:10:36.7582221Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_defer_processing_backedup_streams to: true
2024-03-20T19:10:36.7583968Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_no_extension_lookup_by_name to: true
2024-03-20T19:10:36.7585663Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_quic_always_support_server_preferred_address to: true
2024-03-20T19:10:36.7613937Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_immediate_response_use_filter_mutation_rule to: true
2024-03-20T19:10:36.7616082Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_send_header_raw_value to: true
2024-03-20T19:10:36.7617753Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_runtime_initialized to: false
2024-03-20T19:10:36.7619474Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_defer_processing_backedup_streams to: true
2024-03-20T19:10:36.7621225Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_no_extension_lookup_by_name to: true
2024-03-20T19:10:36.7622934Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_quic_always_support_server_preferred_address to: true
2024-03-20T19:10:36.7624732Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_immediate_response_use_filter_mutation_rule to: true
2024-03-20T19:10:36.7626586Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_send_header_raw_value to: true
2024-03-20T19:10:36.7628342Z [external/com_google_absl/absl/flags/internal/flag.cc : 113] RAW: Restore saved value of envoy_reloadable_features_runtime_initialized to: false
2024-03-20T19:10:36.7630916Z [  FAILED  ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndSetTrailersIncorrectlyOnResponse/IPv6_GoogleGrpc_NoDeferredProcessing, where GetParam() = ((4-byte object <01-00 00-00>, 4-byte object <01-00 00-00>), false) (10547 ms)

jmarantz avatar Mar 20 '24 20:03 jmarantz

Response timeouts. This test is flaky. Retest should be able to get the CI pass.

Overall, ext_proc_integration_test.cc is growing too big. We probably should start to convert some of the test cases to unit test and move them to filter_test.cc. @tyxia @stevenzzzz

yanjunxiang-google avatar Mar 20 '24 22:03 yanjunxiang-google

We should not consider this state to be acceptable. We should use simulated time in tests, where possible, to make tests not be brittle to time. If that is impossible we should allow much more generous timeouts.

jmarantz avatar Mar 21 '24 12:03 jmarantz

I am thinking we probably can first try to use timeout value larger than TestUtility::DefaultTimeout for the trailer case, to see if it helps to de-flake.

Regarding converting to unit test. I personally like integration test (and over unit test :)), as it tests the code in a closer-to-production and more E2E way. If we have concern with its size, we probably can refactor it into multiple integration test.

tyxia avatar Mar 25 '24 04:03 tyxia

Another case: https://dev.azure.com/cncf/envoy/_build/results?buildId=168324&view=logs&j=8c169225-0ae8-53bd-947f-07cb81846cb5&t=d1a98671-b7ba-5fbf-f06c-ff337c010df4&l=1021

adisuissa avatar Apr 18 '24 13:04 adisuissa

ext_proc has side stream process which adds round trip latency for a request. This probably is the reason why these ext_proc tests are frequently flaky due to timeout. As a first remedy, we can certainly increase the timeout to reduce the probability of the flakiness

Long term, for tests only need to verify ext_proc filter behavior/state machine, actually filter_test.cc: one example, https://github.com/envoyproxy/envoy/blob/313b6fb7cf0f806e74a2d42c93e7c1fcccce2391/test/extensions/filters/http/ext_proc/filter_test.cc#L587 can test most of them. Tests in this file using simulated timer. We very rarely see flakiness in this test.

yanjunxiang-google avatar Apr 18 '24 14:04 yanjunxiang-google

@tyxia yes, integration tests is end-to-end and close to production. The draw back is it has to create an Envoy instance, which is slow and expensive for tests not necessarily need it.

yanjunxiang-google avatar Apr 18 '24 14:04 yanjunxiang-google

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 18 '24 16:05 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar May 25 '24 16:05 github-actions[bot]

Still happening (in tsan). https://github.com/envoyproxy/envoy/actions/runs/14583047690/job/40903497197

[ RUN      ] IpVersionsClientTypeDeferredProcessing/ExtProcIntegrationTest.GetAndCloseStream/3
test/integration/fake_upstream.cc:812: Failure
Failed
Please don't waitForHttpConnection with a 5s timeout if failure is expected

Stack trace:
  0x55a5c5965c38: Envoy::FakeUpstream::waitForHttpConnection()
  0x55a5c40f6c1b: Envoy::ExtProcIntegrationTest::handleUpstreamRequest()
  0x55a5c40f52e2: Envoy::ExtProcIntegrationTest::testGetAndCloseStream()
  0x55a5c3ec0130: Envoy::ExtProcIntegrationTest_GetAndCloseStream_Test::TestBody()
  0x55a5c3ec01dd: Envoy::ExtProcIntegrationTest_GetAndCloseStream_Test::TestBody()
  0x55a5cce1406d: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x55a5ccde571b: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x55a5ccdc9c69: testing::Test::Run()
  0x55a5ccdcaab2: testing::TestInfo::Run()
... Google Test internal frames ...

test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc:340: Failure
Value of: fake_upstreams_[0]->waitForHttpConnection(*dispatcher_, fake_upstream_connection_)
  Actual: false (Timed out waiting for new connection.)
Expected: true

ravenblackx avatar Apr 22 '25 15:04 ravenblackx