oauth2: Fix panic when oauth upstream is unhealthy
As discussed on [email protected], Envoy currently panics when receiving an Oauth callback while the Oauth has no healthy upstream (e.g. because Envoy can't resolve its hostname), because the onSuccess callback is called immediately during the request dispatch in that case, before the oauth state is changed to pending.
The fix is to set the state before sending the request. Since the state isn't used anywhere else, there's no reason to only set it afterwards.
Unfortunately, I wasn't able to test this properly because I ran into various rather random-looking Bazel failures both building locally or using the CI Docker script (on a Mac).
Failure using the CI Docker script
``` ERROR: /build/bazel_root/base/external/envoy/bazel/foreign_cc/BUILD:11:15: no such package '@com_github_axboe_liburing//': error globbing [**] op=FILES: /build/bazel_root/base/external/com_github_axboe_liburing/man/IO_URING_VERSION_MAJOR.3 (Too many levels of symbolic links) and referenced by '@envoy//bazel/foreign_cc:liburing' ```Failure using the CI Docker script on the v1.30.1 tag
``` ERROR: /source/WORKSPACE:13:19: no such package '@v8//': error globbing [tools/testrunner/**/*.py] op=FILES: /build/bazel_root/base/external/v8/tools/testrunner/testdata/testroot6/out/build (No such file or directory) and referenced by '//external:wee8' ```Failure building directly on Mac OS
``` compiling lib/findprog-in.c... ./lib/findprog-in.c:149:25: error: call to undeclared function 'eaccess'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] if (eaccess (progpathname, X_OK) == 0) ^ ./lib/findprog-in.c:149:25: note: did you mean 'access'? /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:431:6: note: 'access' declared here int access(const char *, int); ^ ./lib/findprog-in.c:301:21: error: call to undeclared function 'eaccess'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] if (eaccess (progpathname, X_OK) == 0) ^ 2 errors generated. Compilation failed. _____ END BUILD LOGS _____ rules_foreign_cc: Build wrapper script location: bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make_tool_foreign_cc/wrapper_build_script.sh rules_foreign_cc: Build script location: bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make_tool_foreign_cc/build_script.sh rules_foreign_cc: Build log location: bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/rules_foreign_cc/toolchains/make_tool_foreign_cc/BootstrapGNUMake.log ```Commit Message: oauth2: Fix panic when oauth upstream is unhealthy Risk Level: Low Testing: unit test Docs Changes: N/A Release Notes: Added Platform Specific Features: N/A
Hi @benediktwerner, welcome and thank you for your contribution.
We will try to review your Pull Request as quickly as possible.
In the meantime, please take a look at the contribution guidelines if you have not done so already.
There is a build error: https://source.cloud.google.com/results/invocations/49bee7ae-a361-46e0-b9ce-9feec302bb92
/wait
Hm, now the test fails but I don't really understand why. I'm not sure I'm reading the stack trace correctly but it looks like it's failing in the destructor, I guess here? But in_flight_request_ should have been set to nullptr here 🤔
Hm, now the test fails but I don't really understand why. I'm not sure I'm reading the stack trace correctly but it looks like it's failing in the destructor, I guess here? But
in_flight_request_should have been set tonullptrhere 🤔
To fix the crash, please move the std::shared_ptr<OAuth2Client> client_; to be the last member of the OAuth2ClientTest class. Your test introduced order dependency in the cleanup.
/wait
Ok, now I see what you mean, request_ was dropped before client_ but accessed in client_'s destructor. But that's not quite the real issue, the order dependency shouldn't even be there since in_flight_request_ is supposed to be nullptr in the client_ destructor, and then it wouldn't access request_.
But when onSucess is called directly inside send, the in_flight_request_ = nullptr happens before in_flight_request_ is even set to the request. It looks like the real AsyncClientImpl::send returns nullptr in this case, instead of the request.
I guess that's almost the same kind of bug in the test as I was fixing in the real code 😅
Really makes me think an integration test would be better here but I didn't quite figure out a proper way to create an unhealthy cluster there and it's kinda difficult without being able to test it myself.
@yanavlasov anything else the author needs to do before another round of review?