azure-kusto-java icon indicating copy to clipboard operation
azure-kusto-java copied to clipboard

Asynchronous HTTP Client

Open The-Funk opened this issue 2 years ago • 24 comments

Describe the solution you'd like Utilize an asynchronous HTTP client instead of Apache's CloseableHttpClient in order to provide a nonblocking transport of data from the ADX database to my Java application.

Describe alternatives you've considered NA

Additional context The current implementation of the ADX library uses the Apache HTTP Client. This client hasn't received an update in 2 years. A good asynchronous alternative might be to use the vert.x web client. https://vertx.io/docs/vertx-web-client/java/ https://mvnrepository.com/artifact/io.vertx/vertx-web-client

The-Funk avatar Jun 08 '22 19:06 The-Funk

Side note, you might want to offer multiple "flavors" for http client since several asynchronous clients exist that are rather opinionated and provide their own "future" or async types. vert,x is one of these.

The-Funk avatar Jun 08 '22 19:06 The-Funk

I submitted a PR to bump the Apache HTTP Client to version 5 which has async support. For the most part the code from version 4 could be reused. Very few breaking changes. Might be worth looking into.

I did also bump the minimum Java version to 11 since Java 8 premier support ended earlier this year.

The-Funk avatar Jun 17 '22 12:06 The-Funk

Any update on this? I am running into issues where trying to query data fast enough is not possible due to ADX' 64MB query limit and the fact that there is still no async support.

The-Funk avatar Sep 21 '22 12:09 The-Funk

Just checking in again to see if anything is being done on this. CompletableFuture support from Apache HTTPClient 5 would be enough to resolve most of my issues I think.

The-Funk avatar Oct 03 '22 17:10 The-Funk

Sadly this feature will be postponed for now. When we will add it, we will probably move to new standard http client in java 11.

AsafMah avatar Nov 23 '22 13:11 AsafMah

I noticed the Java 11 implementation has gone through, would you like help refactoring to use the Java 11 client?

The-Funk avatar Dec 02 '22 18:12 The-Funk

@AsafMah do you know if this is still in the pipeline along with the native Java client?

I'm also noticing that this library using msal4j, are there plans to switch to azure-identity at some point?

The-Funk avatar Mar 14 '23 19:03 The-Funk

Regarding use of MSAL, we do not plan on rewriting all the auth code using Azure.Identity, however we should provide a WithAzureTokenCredential() API for customers to work with.

@AsafMah , consider adding this to Ron's task list so this gets done sooner then later.

yogilad avatar Mar 19 '23 11:03 yogilad

@yogilad do you know if async operation is still in the pipeline for the library? I would like to reduce the number of threads that must be allocated to kusto-java without impacting performance.

The-Funk avatar May 08 '23 15:05 The-Funk

This work not is planned anytime soon. The main reason is, it requires rearchitecting the SDK to avoid code duplications which comes at a high cost.

yogilad avatar May 14 '23 11:05 yogilad

@yogilad @AsafMah @ohadbitt I have extra availability in the coming weeks, would you like me to take a shot at upgrading the project to use httpclient5 again, or potentially switching to the Java 11 built-in HTTP Client?

Impact of moving to the Java 11 client:

  • potential for major rework of specific classes/methods
  • requires moving target to Java 11 minimum
  • removes several dependencies (by my count, at least 4)

Impact of moving to httpclient5:

  • potential for major rework of specific classes/methods

Note: The last time I tried this, I moved to httpclient5. For the most part the existing code could be kept/reused however client 5 does get rid of connection pooling features, instead using async io under the hood to handle concurrent connections. This is something that might need tested.

If you have any interest, let me know.

The-Funk avatar Nov 28 '23 15:11 The-Funk

Hi @The-Funk Cole, We decided that any future change in our httpClient should move to align with other Azure SDKs httpClient usage which is using projectReactor and the underlying reactor-netty client.

API shouldn't change much besides adding an Async interface as the Mono returned by storage async client can be translated to CompleteableFuture

We need to make sure this would work with the factory methods where user provides the client himself - so that we can support proxy configurations in both msal authentication and Blob storage operations.

ohadbitt avatar Dec 03 '23 14:12 ohadbitt

@The-Funk , are the thumbs up meaning you want to implement it our way or wait for us to complete this task some time in the future?

ohadbitt avatar Dec 05 '23 20:12 ohadbitt

@ohadbitt I can give it the old college try!

I am wondering how much effort it will be.

The use of entity is what concerns me most. The concept of an HTTP entity is basically all but gone in every modern HTTP client. I think Apache httpclient4 was the last one holding onto it.

I am thinking of easy ways to work around its usage in the existing codebase. An entity is just a few specific headers and the message body....

I am also wondering about how the HTTP client is currently passed around in the code.

If you move to using async under the hood like the Netty client does, you won't need client pools, so you could just use a single HTTP client and hold open connections. I think that would be a lot more efficient. The question is how to implement it.

I would guess you wouldn't want to lock the users into only having one client, so a singleton is out. What if a user wants to connect to two disparate ADX databases? But I am thinking that you could store the inner HTTP client as an instance property of the KustoClient abstraction layer. Then you could create as many (or as few) Kusto Clients as you want, to connect to as many (or as few) ADX clusters as you want, and each KustoClient only maintains 1 HTTP client.

Does that sound like a decent gameplan to y'all?

The-Funk avatar Dec 05 '23 21:12 The-Funk

Yea this sounds good - we don't do client pools - you can see the httpclient is already a member of the KustoClient - so no work here You might want to think about starting with a small PR just to moving using the netty client with current impl - see how it works. Second big PR will be to make async API

ohadbitt avatar Dec 10 '23 13:12 ohadbitt

@AsafMah @ohadbitt @yogilad

With the holidays approaching I've taken a significant amount of PTO to work on projects I find interesting and that are useful to me. If you would like, I can begin work on a second PR to address adding async options with the new HTTP client.

I was thinking something along the lines of just mirroring the sync APIs in a new interface. I would use the same names like execute and executeToJson but the return types in the new interface would be CompletableFutures or Monos depending on your preference.

From there I could create a new client implementation of said interface and add it to the factory class.

I'll try to make use of when() and proper chaining with the Netty async API.

The-Funk avatar Dec 19 '23 18:12 The-Funk

Hey @The-Funk As i said today in the PR - main issue here would be to test it well would you like to take this effort as well ? If so its better to do it first as its relevant to both PRs We can discuss it here or on the PR

ohadbitt avatar Dec 20 '23 14:12 ohadbitt

@ohadbitt @AsafMah

I've given this some thought. There's a few directions we could go for the async client.

As is, there are 12 methods in the Client interface for the sync client. However in the existing client implementation all of these methods chain to just 2 implementations. One for execute() and one for executeToJsonResult(). The rest are just helper methods that chain to one of these 2.

A few directions we could go:

  • Create an entirely separate Client implementation, call it AsyncClientImpl for instance, and give it 12 methods, 6 of which return a Mono<String> (async executeToJsonResult) and 6 a Mono<KustoOperationResult> (async execute). We could have an AsyncClient interface to back this.

  • Alternatively we could add those 12 methods to the existing Client interface/implementation, and then perform the same kind of operation chaining as in the sync versions of the methods.

  • A 3rd path that I like the most involves a PR I opened yesterday, #358

That PR from yesterday introduces a KustoQuery object and if it is something that interests y'all, we could use that PR to deprecate the method chaining from V5 and below (without deleting them at this time). If we went the route of introducing the wrapper object first, we could use the existing Client interface and Client implementation and add just 2 additional methods that would look like the following:

   public Mono<KustoOperationResult> execute(KustoQuery kq) {
      // implementation
   }

   public Mono<String> executeToJsonResult(KustoQuery kq) {
      // implementation
   }

Let me know what y'all think. I'm open to any of these approaches.

Disclaimer: Because the azure-core HTTP Client's default async return type is Flux/Mono, I figure it would be best to just stick with these types for the async results. These are reactive types, which gives them significantly more utility by default than Java Futures/CompleteableFutures and there are adapters for these reactor types for just about every other reactive library. I myself will probably convert them into Mutiny-API equivalents in my projects.

The-Funk avatar Mar 26 '24 22:03 The-Funk

For your question, we want to decrease the number of overloads in general:

  • Remove the "execute" method entirely, make executeQuery or executeMgmt explicit.
  • For executeFromJson, only leave the one overload - with all of the parameters. As this method is rarely used.

We can use KustoQuery internally if it helps.

So the amount is much more manageable.

As for the API methods - let's have them in the same object as new methods (that end with async).

At the end, we want all of our internals to be async, and the sync methods will be simple wrappers that block.

As for the order of work, I think we should start with data, then ingest, in a bottom-up manner:

  • Start with CloudSettings, make it internally async and expose an async interface.
  • Then authentication - all of the token providers. In other SDKs we got rid of the msal library in favor of using azure-identity completely, if we'll see it's possible here we might do this transition ourselves.
  • Then the client itself

We can talk over it on a call if you're available.

AsafMah avatar Apr 14 '24 09:04 AsafMah

@AsafMah I should have availability for a call sometime this week. What is the best way to get in contact with y'all? Teams? Discord?

Regarding the order of operations, I think the above sounds good.

The-Funk avatar Apr 15 '24 15:04 The-Funk

@AsafMah I have started working on this locally. Will try to push some changes to my existing KustoQuery branch to show some PoC work.

The-Funk avatar Apr 23 '24 17:04 The-Funk

Update: I am still working in my branch. Haven't had a chance to push anything yet but I am hoping to push code this weekend to show progress.

The-Funk avatar Apr 30 '24 23:04 The-Funk

Hello, we have been on a holiday for the last two weeks so we haven't been able to keep in touch.

Let's arrange a team meeting, how can we reach you?

AsafMah avatar May 05 '24 11:05 AsafMah

If you email [email protected] that should work. I can also accept Teams/Zoom invitations there.

The-Funk avatar May 05 '24 15:05 The-Funk