quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

Quarkus OIDC makes first request very slow

Open wesleysalimansdvb opened this issue 1 year ago • 11 comments

Describe the bug

Hi,

We use Quarkus OIDC for multi tenancy. We do not use Keycloak, we use our own authentication system. Therefore we set quarkus.oidc.discovery-enabled to false, which results in our OIDC config not getting loaded on startup, but on the first request. This makes the first request that gets done to our container (512 vCPU, 1024 memory) very slow (around 4 seconds).

Is there a way to make a healthcheck manually load the OIDC providers before the first request gets done to the container? Or could there be added a new configuration property to load the providers on startup (such as proposed in https://github.com/quarkusio/quarkus/issues/34890)

quarkus.oidc.enabled=true
quarkus.oidc.discovery-enabled=false
quarkus.oidc.use-blocking-dns-lookup=true
quarkus.oidc.bca.public-key=${OUR_PUBLIC_KEY}
quarkus.oidc.bca.token.audience=bsp
quarkus.oidc.bca.token.issued-at-required=false
quarkus.oidc.bca.application-type=service
quarkus.oidc.bca.token.signature-algorithm=rs256
quarkus.oidc.auth-server-url=${JWKS_LOCATION}
quarkus.oidc.jwks-path=${JWKS_LOCATION}
quarkus.oidc.token.issued-at-required=false
quarkus.oidc.token.signature-algorithm=es256
quarkus.oidc.application-type=service

We also have our own TenantResolverConfig, to get the tenant config based on the tenant:

@ApplicationScoped
public class TokenTenantResolver implements TenantConfigResolver {
  private static final String ISSUER_PATH = "iss";

  @ConfigProperty(name = "quarkus.oidc.bca.token.issuer")
  String bcaIssuer;

  @ConfigProperty(name = "quarkus.oidc.bca.public-key")
  String bcaPublicKey;

  @SneakyThrows
  public Uni<OidcTenantConfig> resolve(
      RoutingContext context, OidcRequestContext<OidcTenantConfig> requestContext) {
    final var jwtString = context.request().headers().get("Authorization");

    if (jwtString == null) {
      return Uni.createFrom().nullItem();
    }

    final var jwtHeaders = io.quarkus.oidc.runtime.OidcUtils.decodeJwtContent(jwtString);

    if (jwtHeaders != null && jwtHeaders.getValue(ISSUER_PATH).equals(bcaIssuer)) {
      OidcTenantConfig config = new OidcTenantConfig();
      config.setTenantId("bca");
      config.setPublicKey(bcaPublicKey);
      config.getToken().setIssuedAtRequired(false);
      config.setApplicationType(OidcTenantConfig.ApplicationType.SERVICE);
      return Uni.createFrom().item(config);
    } else {
      // resolve to default tenant config
      return Uni.createFrom().nullItem();
    }
  }
}

Expected behavior

All the fields in our config for our OIDC tenants are "deterministic" as in, they could easily be loaded on startup. So initializing them on startup would be our expected behavior

Actual behavior

The OIDC config does not get loaded on startup, resulting in intializing all OIDC classes etc on the first request, which makes the first request very slow.

How to Reproduce?

No response

Output of uname -a or ver

AWS ECS Fargate container

Output of java -version

openjdk 21.0.1 2023-10-17 LTS OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build 21.0.1+12-LTS) OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS, mixed mode, sharing)

Quarkus version or git rev

3.11.0

Build tool (ie. output of mvnw --version or gradlew --version)

docker

Additional information

No response

wesleysalimansdvb avatar Jun 26 '24 08:06 wesleysalimansdvb

/cc @pedroigor (oidc), @sberyozkin (oidc)

quarkus-bot[bot] avatar Jun 26 '24 08:06 quarkus-bot[bot]

Any chance you could get some profiling done?

We have some guidance here: https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md .

gsmet avatar Jun 26 '24 08:06 gsmet

I do not have the time right now to do any profiling, maybe these CloudWatch logs give you some insights? oidc-initalizing.xlsx

wesleysalimansdvb avatar Jun 26 '24 08:06 wesleysalimansdvb

My "main" question right now is (as a quick hack): How can i manually initialize the OIDC client so it is ready for my first request (via a healthcheck or something).

wesleysalimansdvb avatar Jun 26 '24 08:06 wesleysalimansdvb

So https://quarkus.io/guides/lifecycle#startup_annotation is what should be used to execute something at startup but I have no idea what you should initialize.

Question for @michalvavrik or @sberyozkin .

gsmet avatar Jun 26 '24 09:06 gsmet

@wesleysalimansdvb Hi, 4 secs is a very long period of time, to get JWKS fetched at runtime should take less than a second.

Why do you think that disabling the discovery postpones pulling the keys until the 1st request's time ? It only skips a step which involves fetching the discovery metadata doc pointing to the JWKS, etc endpoint, but JWKS, by default, will be fetched immediately: https://github.com/quarkusio/quarkus/blob/main/extensions/oidc/runtime/src/main/java/io/quarkus/oidc/runtime/OidcRecorder.java#L404

I wonder if quarkus.oidc.use-blocking-dns-lookup=true is a culprit here ? It is only required for slow systems to avoid a blocked IO thread problem when a connection to the provider is being established.

As a side note, you may want to experiment with https://quarkus.io/version/main/guides/security-openid-connect-multitenancy#issuer-based-tenant-resolver, to avoid having to create a custom resolver above which parses the token itself...

sberyozkin avatar Jun 26 '24 09:06 sberyozkin

quarkus.oidc.use-blocking-dns-lookup=true was added because we kept getting Vert.x blocked thread exceptions.

2024-06-24 07:13:24,827 WARN [io.ver.cor.imp.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-1,5,main] has been blocked for 3160 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked

After enabling this property those exceptions are gone. 4 secs indeed is a long time, but that is what our logging says how long it takes to intialize all classes, algorithms and parse the token.

Thanks for the sidenote, this is something we will definitely try out!

wesleysalimansdvb avatar Jun 26 '24 09:06 wesleysalimansdvb

@wesleysalimansdvb So that stack trace says that it takes 3.16+ seconds to establish a connection with your provider. Sounds like this is what is happening, you get something close to 4 secs to have it established. You can try any of the quickstarts with live Keycloak and with the disabled discovery and I can guarantee you it won't take 4 seconds to handle a first request

sberyozkin avatar Jun 26 '24 10:06 sberyozkin

IMHO it is about your dev env's setup than anything else, a lot of users have the discovery disabled for endpoints, most of the OAuth2 social providers don't support it. Please prove it otherwise with the profiling if you'd like to have some progress on this issue

sberyozkin avatar Jun 26 '24 10:06 sberyozkin

@wesleysalimansdvb

Thanks for the sidenote, this is something we will definitely try out!

Sure, give it a try please and let Michal and myself know how it goes

sberyozkin avatar Jun 26 '24 16:06 sberyozkin

@wesleysalimansdvb Can you temporarily register a test client application and share the details with me offline (at Zulip, or post them to sbiarozk at redhat dot com with CC to someone else in your team) ? I can try it from my laptop to check what is going on there. I'd be worried even about 2 secs if I ever saw it happening in such cases, but I did not see it...

sberyozkin avatar Jun 28 '24 10:06 sberyozkin

Hi @sberyozkin, I want to give you a headsup that @wesleysalimansdvb is out of office until next week. He will come back to you when he returns.

bartm-dvb avatar Jul 01 '24 08:07 bartm-dvb

Hi @bartm-dvb, thanks for the heads up, sure, lets continue next week or so and see what we can realistically do at the Quakus level

sberyozkin avatar Jul 01 '24 10:07 sberyozkin

Is there anything new regarding this issue?

geoand avatar Jul 18 '24 13:07 geoand

I understood further communication is happening through Zulip for now

bartm-dvb avatar Jul 18 '24 13:07 bartm-dvb

We have touched on it with @wesleysalimansdvb who spent more time on looking at the issue and could confirm about 4 secs are spent. However, what I said to @wesleysalimansdvb, that we can't fix anything with the images like the one attached to this issue.

So, there are two things Quarkus OIDC does when the discovery is disabled but the JWK endpoint is configured directly:

  1. It performs a networking operation to fetch JWK key set
  2. It process this key set

The team assumes Quarkus OIDC spends this time on step 2.

My bet is, given that you had to enable the blocking DNS to overcome a slow DNS resolution, it is spent on step 1, which is I believe is in fact confirmed by the stackstrace @wesleysalimansdvb pasted above, where, without this option, Quarkus times out after 3.16 secs and the actual time required to perform the networking op is higher.

I'd like to encourage you to run quarkus-quickstarts/security-openid-connect-web-authentication with the disabled discovery. I can assure you won't notice a difference, with and without the discovery.

I also suggested @wesleysalimansdvb to have a breakpoint set in OidcRecorder where JWKs are resolved, @wesleysalimansdvb has already done one PR so it would not be a problem to get the source etc.

IMHO this issue is not really a Quarkus issue and I'd not mind closing it. But I know this is an important one for you so let's keep discussing what can be done about it. Thanks

sberyozkin avatar Jul 21 '24 12:07 sberyozkin

FYI, I am trying out the quarkus-quickstarts to reproduce the issue. I'll be back with my findings later

bartm-dvb avatar Jul 31 '24 15:07 bartm-dvb

@sberyozkin So what seems to most time in step 2 (It process this key set) is initialising jose4j. Both in the profiling output and logs, it seems to take a lot of time. On my local machine it takes 64ms, in the logs @wesleysalimansdvb attached it takes 298ms. Is initialising jose4j something that could be moved to build time?

2024-06-24 09:23:17,817 DEBUG [org.jos.jwa.AlgorithmFactoryFactory] (vert.x-eventloop-thread-0) Initialized jose4j in 298ms

bartm-dvb avatar Aug 01 '24 12:08 bartm-dvb

@sberyozkin could you please look at this thread?

wesleysalimansdvb avatar Aug 15 '24 11:08 wesleysalimansdvb

@wesleysalimansdvb @bartm-dvb Sure, we can try to do that. Even in the worst case though, 0.298ms, is still 0.33 of a sec. In any case though, it is worth optimizing. Would you be interested in a small PR ? Like adding a build step that will call out to OIDC recorder which would make some neural Jose4j call causing its initialization ?

sberyozkin avatar Aug 22 '24 13:08 sberyozkin

@wesleysalimansdvb @bartm-dvb, in general the OIDC provider initialization must be runtime as the keys are fetched at runtime. But having some neutral Jose call which can be recorded can certainly be done. For example, AlgorithmFactoryFactory.getInstance() should do.

Let me know if you'll be up to it

sberyozkin avatar Aug 22 '24 16:08 sberyozkin

Hi @sberyozkin, we will be proposing a PR any of these days :)

wesleysalimansdvb avatar Aug 26 '24 07:08 wesleysalimansdvb

Hi @wesleysalimansdvb, thanks very much, I just thought I'd create a quick one today, #42765, as I got a few minutes, I did not notice your comment today.

I'm sure there will be more opportunities for OIDC PRs :-)

Note I've marked #42765 to close this issue, I know it might not answer all your questions, but I feel we should rather focus on more specific issues related to possible first slow requests, if you notice them. Please create them as necessary going forward

sberyozkin avatar Aug 26 '24 17:08 sberyozkin

Hahah thanks @sberyozkin for your help & proposing a fix :) I might open a separate recommendation to make it optional to fetch the JWKS at boot time, so we can also initialize the algorithms that are used before the first request :)

wesleysalimansdvb avatar Aug 27 '24 08:08 wesleysalimansdvb

Hi @wesleysalimansdvb There is a quarkus.oidc.jwks.resolve-early flag which you can set to false, give that a try, but yeah, please keep opening issues/PRs

Thanks to you and @bartm-dvb, cheers

sberyozkin avatar Aug 27 '24 10:08 sberyozkin

@sberyozkin With the 3.16-RC1 I am seeing a ~30% improvement in first request latency with the oidc quick-start, thanks for your work!

bartm-dvb avatar Oct 23 '24 15:10 bartm-dvb