quarkus
quarkus copied to clipboard
Quarkus OIDC makes first request very slow
Describe the bug
Hi,
We use Quarkus OIDC for multi tenancy. We do not use Keycloak, we use our own authentication system. Therefore we set quarkus.oidc.discovery-enabled to false, which results in our OIDC config not getting loaded on startup, but on the first request. This makes the first request that gets done to our container (512 vCPU, 1024 memory) very slow (around 4 seconds).
Is there a way to make a healthcheck manually load the OIDC providers before the first request gets done to the container? Or could there be added a new configuration property to load the providers on startup (such as proposed in https://github.com/quarkusio/quarkus/issues/34890)
quarkus.oidc.enabled=true
quarkus.oidc.discovery-enabled=false
quarkus.oidc.use-blocking-dns-lookup=true
quarkus.oidc.bca.public-key=${OUR_PUBLIC_KEY}
quarkus.oidc.bca.token.audience=bsp
quarkus.oidc.bca.token.issued-at-required=false
quarkus.oidc.bca.application-type=service
quarkus.oidc.bca.token.signature-algorithm=rs256
quarkus.oidc.auth-server-url=${JWKS_LOCATION}
quarkus.oidc.jwks-path=${JWKS_LOCATION}
quarkus.oidc.token.issued-at-required=false
quarkus.oidc.token.signature-algorithm=es256
quarkus.oidc.application-type=service
We also have our own TenantResolverConfig, to get the tenant config based on the tenant:
@ApplicationScoped
public class TokenTenantResolver implements TenantConfigResolver {
private static final String ISSUER_PATH = "iss";
@ConfigProperty(name = "quarkus.oidc.bca.token.issuer")
String bcaIssuer;
@ConfigProperty(name = "quarkus.oidc.bca.public-key")
String bcaPublicKey;
@SneakyThrows
public Uni<OidcTenantConfig> resolve(
RoutingContext context, OidcRequestContext<OidcTenantConfig> requestContext) {
final var jwtString = context.request().headers().get("Authorization");
if (jwtString == null) {
return Uni.createFrom().nullItem();
}
final var jwtHeaders = io.quarkus.oidc.runtime.OidcUtils.decodeJwtContent(jwtString);
if (jwtHeaders != null && jwtHeaders.getValue(ISSUER_PATH).equals(bcaIssuer)) {
OidcTenantConfig config = new OidcTenantConfig();
config.setTenantId("bca");
config.setPublicKey(bcaPublicKey);
config.getToken().setIssuedAtRequired(false);
config.setApplicationType(OidcTenantConfig.ApplicationType.SERVICE);
return Uni.createFrom().item(config);
} else {
// resolve to default tenant config
return Uni.createFrom().nullItem();
}
}
}
Expected behavior
All the fields in our config for our OIDC tenants are "deterministic" as in, they could easily be loaded on startup. So initializing them on startup would be our expected behavior
Actual behavior
The OIDC config does not get loaded on startup, resulting in intializing all OIDC classes etc on the first request, which makes the first request very slow.
How to Reproduce?
No response
Output of uname -a or ver
AWS ECS Fargate container
Output of java -version
openjdk 21.0.1 2023-10-17 LTS OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS) OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS, mixed mode, sharing)
Quarkus version or git rev
3.11.0
Build tool (ie. output of mvnw --version or gradlew --version)
docker
Additional information
No response
/cc @pedroigor (oidc), @sberyozkin (oidc)
Any chance you could get some profiling done?
We have some guidance here: https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md .
I do not have the time right now to do any profiling, maybe these CloudWatch logs give you some insights? oidc-initalizing.xlsx
My "main" question right now is (as a quick hack): How can i manually initialize the OIDC client so it is ready for my first request (via a healthcheck or something).
So https://quarkus.io/guides/lifecycle#startup_annotation is what should be used to execute something at startup but I have no idea what you should initialize.
Question for @michalvavrik or @sberyozkin .
@wesleysalimansdvb Hi, 4 secs is a very long period of time, to get JWKS fetched at runtime should take less than a second.
Why do you think that disabling the discovery postpones pulling the keys until the 1st request's time ? It only skips a step which involves fetching the discovery metadata doc pointing to the JWKS, etc endpoint, but JWKS, by default, will be fetched immediately: https://github.com/quarkusio/quarkus/blob/main/extensions/oidc/runtime/src/main/java/io/quarkus/oidc/runtime/OidcRecorder.java#L404
I wonder if quarkus.oidc.use-blocking-dns-lookup=true is a culprit here ? It is only required for slow systems to avoid a blocked IO thread problem when a connection to the provider is being established.
As a side note, you may want to experiment with https://quarkus.io/version/main/guides/security-openid-connect-multitenancy#issuer-based-tenant-resolver, to avoid having to create a custom resolver above which parses the token itself...
quarkus.oidc.use-blocking-dns-lookup=true was added because we kept getting Vert.x blocked thread exceptions.
2024-06-24 07:13:24,827 WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 3160 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked
After enabling this property those exceptions are gone. 4 secs indeed is a long time, but that is what our logging says how long it takes to intialize all classes, algorithms and parse the token.
Thanks for the sidenote, this is something we will definitely try out!
@wesleysalimansdvb
So that stack trace says that it takes 3.16+ seconds to establish a connection with your provider. Sounds like this is what is happening, you get something close to 4 secs to have it established.
You can try any of the quickstarts with live Keycloak and with the disabled discovery and I can guarantee you it won't take 4 seconds to handle a first request
IMHO it is about your dev env's setup than anything else, a lot of users have the discovery disabled for endpoints, most of the OAuth2 social providers don't support it. Please prove it otherwise with the profiling if you'd like to have some progress on this issue
@wesleysalimansdvb
Thanks for the sidenote, this is something we will definitely try out!
Sure, give it a try please and let Michal and myself know how it goes
@wesleysalimansdvb Can you temporarily register a test client application and share the details with me offline (at Zulip, or post them to sbiarozk at redhat dot com with CC to someone else in your team) ? I can try it from my laptop to check what is going on there. I'd be worried even about 2 secs if I ever saw it happening in such cases, but I did not see it...
Hi @sberyozkin, I want to give you a headsup that @wesleysalimansdvb is out of office until next week. He will come back to you when he returns.
Hi @bartm-dvb, thanks for the heads up, sure, lets continue next week or so and see what we can realistically do at the Quakus level
Is there anything new regarding this issue?
I understood further communication is happening through Zulip for now
We have touched on it with @wesleysalimansdvb who spent more time on looking at the issue and could confirm about 4 secs are spent. However, what I said to @wesleysalimansdvb, that we can't fix anything with the images like the one attached to this issue.
So, there are two things Quarkus OIDC does when the discovery is disabled but the JWK endpoint is configured directly:
- It performs a networking operation to fetch JWK key set
- It process this key set
The team assumes Quarkus OIDC spends this time on step 2.
My bet is, given that you had to enable the blocking DNS to overcome a slow DNS resolution, it is spent on step 1, which is I believe is in fact confirmed by the stackstrace @wesleysalimansdvb pasted above, where, without this option, Quarkus times out after 3.16 secs and the actual time required to perform the networking op is higher.
I'd like to encourage you to run quarkus-quickstarts/security-openid-connect-web-authentication with the disabled discovery. I can assure you won't notice a difference, with and without the discovery.
I also suggested @wesleysalimansdvb to have a breakpoint set in OidcRecorder where JWKs are resolved, @wesleysalimansdvb has already done one PR so it would not be a problem to get the source etc.
IMHO this issue is not really a Quarkus issue and I'd not mind closing it. But I know this is an important one for you so let's keep discussing what can be done about it. Thanks
FYI, I am trying out the quarkus-quickstarts to reproduce the issue. I'll be back with my findings later
@sberyozkin So what seems to most time in step 2 (It process this key set) is initialising jose4j. Both in the profiling output and logs, it seems to take a lot of time. On my local machine it takes 64ms, in the logs @wesleysalimansdvb attached it takes 298ms. Is initialising jose4j something that could be moved to build time?
2024-06-24 09:23:17,817 DEBUG [org.jos.jwa.AlgorithmFactoryFactory] (vert.x-eventloop-thread-0) Initialized jose4j in 298ms
@sberyozkin could you please look at this thread?
@wesleysalimansdvb @bartm-dvb Sure, we can try to do that. Even in the worst case though, 0.298ms, is still 0.33 of a sec. In any case though, it is worth optimizing. Would you be interested in a small PR ? Like adding a build step that will call out to OIDC recorder which would make some neural Jose4j call causing its initialization ?
@wesleysalimansdvb @bartm-dvb, in general the OIDC provider initialization must be runtime as the keys are fetched at runtime. But having some neutral Jose call which can be recorded can certainly be done. For example, AlgorithmFactoryFactory.getInstance() should do.
Let me know if you'll be up to it
Hi @sberyozkin, we will be proposing a PR any of these days :)
Hi @wesleysalimansdvb, thanks very much, I just thought I'd create a quick one today, #42765, as I got a few minutes, I did not notice your comment today.
I'm sure there will be more opportunities for OIDC PRs :-)
Note I've marked #42765 to close this issue, I know it might not answer all your questions, but I feel we should rather focus on more specific issues related to possible first slow requests, if you notice them. Please create them as necessary going forward
Hahah thanks @sberyozkin for your help & proposing a fix :) I might open a separate recommendation to make it optional to fetch the JWKS at boot time, so we can also initialize the algorithms that are used before the first request :)
Hi @wesleysalimansdvb There is a quarkus.oidc.jwks.resolve-early flag which you can set to false, give that a try, but yeah, please keep opening issues/PRs
Thanks to you and @bartm-dvb, cheers
@sberyozkin With the 3.16-RC1 I am seeing a ~30% improvement in first request latency with the oidc quick-start, thanks for your work!