quarkus icon indicating copy to clipboard operation
quarkus copied to clipboard

[RestEasy Reactive Client] Use AsyncInputStream for Posting InputStream

Open fredericBregier opened this issue 2 years ago • 19 comments

InputStream was not taking into account when using sending it with ReastEasy client to a Rest service.

Why: The previous implementation was putting all bytes in one buffer, leading either to OOME or limit on acceptable Buffer size.

Change: Add InputStream Async support such that it will not fill the memory and respect back pressure.

Note that, to work, quarkus.http.limits.max-body-size shall be set to a correct value (ex. 0), since currently VertxInputStream has soft limit on this value. The behavior is not changed here but could if necessary since it is no more in memory (not one buffer but chunked mode with back pressure). But as setting the value to 0 or enough is working.

Add a test to check correctness (speed and memory) but this can be improved since it only checks at the end the status and/or the size of the inputstream.

fredericBregier avatar Nov 24 '23 21:11 fredericBregier

Note 2: POST is about 4 times slower than GET, but still better than nothing. Note 3: Using Netty as low level client side, performances are 4 times better on GET, 50% better on POST. But that's another story.

POST: up to 950 MB/s, using Netty as client only up to 1250 MB/s GET: up to 950 MB/s, using Netty as client only up to 4000 MB/s

fredericBregier avatar Nov 25 '23 15:11 fredericBregier

@vietj can you have a look to the Vert.x Input Stream to Read Stream class?

cescoffier avatar Nov 27 '23 07:11 cescoffier


:waning_crescent_moon: This workflow status is outdated as a new workflow run has been triggered.


Failing Jobs - Building d34c42a898bc766dba688b3576324c5de980f3e8

Status Name Step Failures Logs Raw logs Build scan
Initial JDK 11 Build Build Failures Logs Raw logs :mag:

You can also consult the Develocity build scans.

Failures

:gear: Initial JDK 11 Build #

- Failing: independent-projects/resteasy-reactive/client/runtime 
! Skipped: devtools/bom-descriptor-json docs extensions/agroal/deployment and 330 more

:package: independent-projects/resteasy-reactive/client/runtime

Failed to execute goal net.revelc.code:impsort-maven-plugin:1.9.0:check (check-imports) on project resteasy-reactive-client: Imports are not sorted in /home/runner/work/quarkus/quarkus/independent-projects/resteasy-reactive/client/runtime/src/main/java/org/jboss/resteasy/reactive/client/handlers/AsyncInputStream.java

quarkus-bot[bot] avatar Nov 27 '23 08:11 quarkus-bot[bot]

Very nice. This will take some time to review, more that the few minutes I currently have on my phone 😎

geoand avatar Nov 27 '23 09:11 geoand

Just a side note: when using direct netty client, out of vertx or quarkus, I used HttpChunkedInput with a chunkWriteHandler to get max performance, but I didn't find a way to use it within reactive client, even if under there is netty...

fredericBregier avatar Nov 27 '23 19:11 fredericBregier

I remember @vietj saying we should use pipe instead of pump :)

geoand avatar Dec 06 '23 14:12 geoand

A bit improvement by fetching 3 instead of 1 at startup (so little memory consumption) and fetching each time a buffer is consumed (this seems to enable better pipelining between consumption and network):

POST: from 950 MB/s to 1200 MB/s, compare to Netty as client only up to 3600 MB/s GET: from 950 MB/s to 2000 MB/s, compare to Netty as client only up to 4400 MB/s

Note that increasing the startup fetching from 2 to 5 still increases the result, with still no memory impact (not noticeable):

  • POST : 1) 950 MB/s 2) 1100 MB/s 3) 1200 MB/s 4) 1250 MB/s 5) 1300 MB/s
  • GET: 1) 950 MB/s 2) 1500 MB/s 3) 2000 MB/s 4) 2250 MB/s 5) 2300 MB/s

My conviction is that 3 shall be the conservative option to get the best average performance while minimizing the memory impact (even if not noticeable). 4 might be the best choice however. (maybe an option for that ?)

fredericBregier avatar Dec 12 '23 18:12 fredericBregier

@vietj can you have a look at this as it very intense on the Vert.x side?

Thanks

geoand avatar Dec 14 '23 15:12 geoand

@fredericBregier I had a quick look: as far as I understood the AsyncInputStream will always be using the VertxBlockingInput so we have the guarantee that using the input stream never blocks ?

vietj avatar Dec 21 '23 08:12 vietj

@vietj This implementation is inspired from various projects (not mines). Of course, I cannot be certain 100%. I tried several ways. I have an issue sometimes, never in simple POST (one client -> one server), but when chaining (client -> (server -> internal client) -> another server). I tried to find out why, but didn't get the reason yet.

However, if there is another way, I can try if you have any idea.

The issue right now is that sending an inputstream (POST) from a client leads to all bytes in memory, which leads to OOME. On the reverse way, getting InputStream (GET) is working (modulo some optimizations that I proposed too, but can be split in 2 MR).

fredericBregier avatar Dec 21 '23 09:12 fredericBregier

@vietj And indeed, only in double POST (client to server to another server), it is indeed VertxBlockingInput that seems to block. But as it is under the wood, it seems difficult to bypass it. (Note again: in single POST, no chaining, no issue)

Any idea ?

fredericBregier avatar Dec 21 '23 10:12 fredericBregier

Sorry for the spam alike...

I tried the following changes. I added a test with concurrent client (10 threads). With previous implementation based on VertxBlockingStream, it was always in error (length not ok). With this version, 8/10 threads are ok, still 2 not. So I hope I'm not so far from a correct answer.

No blocking but early closing of the InputStream... Still investigating

(Applied on all 3 versions with small adaptations to context)

fredericBregier avatar Dec 21 '23 17:12 fredericBregier

Stopping research for now: debugging gives me that it seems Http1xServerConnection.onEnd is called too early, such that there are missing chunks, but I don't get why. Sending with 1 thread, even multiple times is OK. But when several threads in function, it turns wrong (6/10 in average), using QuarkusRestClientBuilder or native Quarkus Rest client.

fredericBregier avatar Dec 21 '23 19:12 fredericBregier

@vietj I ended up with a new version which seems far more stable, both in sequential and concurrent usages.

  • The VertBlockingInput is now far less blocking, only on necessary parts.
  • AsyncInputStream is inlining the dependence from Vertx (InboundBuffer) with light modification to simplify it

fredericBregier avatar Dec 22 '23 11:12 fredericBregier

@vietj Hi, I know the end and begiining of year are not the most favorable time to review ;-) Any chance that this time my proposal is closer to your thought ?

fredericBregier avatar Jan 09 '24 08:01 fredericBregier

@fredericBregier I need to dedicate time to this, now seems more favorable indeed!

vietj avatar Jan 09 '24 20:01 vietj

Any news? I guess next quarkus release is probably taken all available time...

fredericBregier avatar Jan 30 '24 19:01 fredericBregier

Hi @vietj I just rebase the code such that to make it easier (no issue with rebase at all).

fredericBregier avatar Feb 17 '24 22:02 fredericBregier

Hi @vietj Any news? Maybe you're thinking to another way (vertx native way?) or is this proposal not ok?

fredericBregier avatar Mar 02 '24 09:03 fredericBregier

@fredericBregier sorry no time for that, I am very overloaded with the recent releases we have done I will try to have a look soon (hint: this code is quite complex)

vietj avatar Mar 03 '24 10:03 vietj

I understand. I know the recent releases were huge!! No problem

fredericBregier avatar Mar 03 '24 19:03 fredericBregier

Hi @vietj Just a short notice on last Quarkus "main" version on ClientSendRequerstHandler with another implementation (far more simpler than mine): I've tried this implementation and it seems to fit. I've got some new errors in my tests but not related (headers issue). I need to investigate. I think there is still some space for optimization on VertxXXXInputStream implementation, but not that mandatory yet.

fredericBregier avatar Mar 16 '24 11:03 fredericBregier

@vietj OK, I found out that the next commit was fixing the header issue (I was on the commit level, so missing this one). I have to test however.

Do you now when those commits will be put in an official release such that we can close easily this MR?

fredericBregier avatar Mar 16 '24 12:03 fredericBregier

Hi @fredericBregier

Those commits are already part of Quarkus 3.8.3.

I am sorry that I had not realized my PR was essentially doing the same thing as yours.

geoand avatar Mar 16 '24 12:03 geoand

@geoand No issue : you've found another way far more readable! That's perfect ! Thanks a lot! The goal was to be able to handle InputStream not fully in memory (or on disk) since container cannot handle such things for huge stream.

I check again using last commits from main (partially imported to not get everything). This works fine!

I think we can close this MR then, and I will probably later on propose some optimizations on VertxXXXInputStream, and you''l see if it is interesting.

fredericBregier avatar Mar 16 '24 12:03 fredericBregier

Glad to hear it works for you!

Improvements are absolutely welcome 😁

geoand avatar Mar 16 '24 13:03 geoand

@geoand The commit #39282 is not in the 3.8.3 release which turns into bad header mapping (no header at all). I believe a quick fix shall be done (3.8.4) since it breaks inputstream handling (I believe however this does not touch other cases)

fredericBregier avatar Mar 23 '24 17:03 fredericBregier

You are right, that was my fault for not properly labeling the PR.

It should be part of the next 3.8 release

geoand avatar Mar 23 '24 17:03 geoand

No problem. As soon as I saw it, I sent this message in order to fix it in next release. Thank you again !

fredericBregier avatar Mar 23 '24 18:03 fredericBregier

Thank you for raising the issue!

geoand avatar Mar 23 '24 18:03 geoand