quarkus
quarkus copied to clipboard
GraphQL spurious authentication failures
Describe the bug
After upgrading from 2.9.2.Final, requests to the GraphQL API for secured resources (queries / mutations) may spuriously fail with an "unauthorized" error-code.
I have created a simple Quarkus GraphQL project that uses the the https://quarkus.io/guides/microprofile-graphql as a base to exhibit / reproduce the issue. Specifically, the quarkus GraphQL example has been modified by adding an extra method to the API (allFilmsSecured
) that's secured using the standard @Authenticated
annotation, and leveraged the quarkus-elytron-security-properties-file
extension to configure a set of valid user credentials.
Expected behavior
When querying the secured endpoint with the correct credentials (HTTP Basic Auth) concurrenctly with / without other unsecured endpoints, one should not receive unauthorized
error-code responses.
Actual behavior
Given enough concurrent requests (tests are configured to 50 but failures occur with fewer too), some will fail with unauthorized
error-code. What's also suprising is that the unexpected behaviour is not exactly the same for all versions > 2.9.2.Final. Specifically, for versions up to 2.10.3.Final the bug seems to occur only when there are concurrent requests with different security credentials and / or a combination of resources with different security restrictions. However on 2.10.4.Final and 2.11.1.Final failures are observed even when loading the service with concurrent requests for the same resource with the exact same credentials.
How to Reproduce?
Executing the Junit tests in ConcurrentAuthTest reproduces the issue on two machines that I have tried it out (8 cores / 16 threads - Ryzen 58000x, and 4 cores / 8 threads intel i7 kaby lake). On the test class there are two tests, one that attempts concurrent requests on both secured and unsecured resources (that fails on all versions > 2.9.2.Final), and another that only performs concurrent requests on the secured resource that fails only on 2.10.4.Final and 2.11.1.Final.
A JMeter Test Plan is also included that reproduces the issue very reliably (JSON Assertions fail when receiving unauthorized
error codes).
Output of uname -a
or ver
Linux fedora-desktop 5.18.13-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Jul 22 14:03:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Output of java -version
openjdk version "17.0.2" 2022-01-18 OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8) OpenJDK 64-Bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)
GraalVM version (if different from Java)
No response
Quarkus version or git rev
No response
Build tool (ie. output of mvnw --version
or gradlew --version
)
Apache Maven 3.8.4
Additional information
No response
/cc @jmartisk, @phillip-kruger
This might be related to https://github.com/quarkusio/quarkus/issues/26748
Yeah that's definitely related to it. @geoand we might need your brain on this one, looks like we have trouble fixing this issue.
I already discussed with @geoand on Friday he gave me a few idea to go and look for. I'll continue tomorrow on this.
I also just thought of something else : maybe somewhere in the code a future / promise return value is being ignored?
@geoand I'll check, why would that be an issue ?
Just thinking it could create a race
I pinged @stuartwdouglas , he is helping out. I am hopeful we will get this fixed soon.
@nvrs could you try the latest snapshots to make sure we fixed it?
See https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#using-snapshots for the instructions.
I'll release new versions once I get your feedback.
Thanks!
The issue is still not entirely fixed. If you're exposing GraphQL services, stay with 2.9.x for now. We will follow up once we are positive the issue is entirely fixed.
@nvrs could you try the latest snapshots to make sure we fixed it?
See https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#using-snapshots for the instructions.
I'll release new versions once I get your feedback.
Thanks!
@gsmet Just saw your reply, sure I can, let me know when to give it another go.
@nvrs I think we got the latest fixes in today. I will ping you tomorrow when the new snapshots are online.
@nvrs the snapshots have been deployed so you can test them by following the instructions here: https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#using-snapshots .
@gsmet, will do and report back
@gsmet I am going to build from source (main branch) instead of using snapshots.
@nvrs as you wish. We have instructions for that too: https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#building-main :)
The jmeter test that so far was consistently reproducing the issue on the example code (100% rate within a couple of hundred requests up until 2.11.1.Final) doesn't do so with the latest build. However, I also tried 2.11.2.Final -- which according to your release notes still exhibits the issue -- and it also passes; Thus the specific test harness is not reliable.
I will try it on our staging env with real code and a more representative load test as soon as possible.
2.11.2 might have fixed your case, the PRs that followed fix similar cases but for the Graphql context
We cannot reproduce the issue as described in the ticket onward from 2.11.2.Final.
Since we do not use directly the GraphQL context we haven't observed any related issue, nor can create a reproducer. We can thus consider the issue fixed.
Should I close or is it preferable to wait / you do it?