pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Upgrade http5 Client to 5.4 and using Http 2 Protocol

Open abhioncbr opened this issue 1 year ago • 6 comments

As per the issue, In this PR we are upgrading the httpclient5 from 5.3.1 to 5.4.

Details

  • We observed test failures in the Dependabot PR because of the default connection reuse strategy. In httpclient5 version 5.4-alpha2, team fixed the issue, and because of this, we started seeing the failures with the error log
Connection error . Details: org.apache.hc.core5.http.NoHttpResponseException: localhost:22000 failed to respond
10:03:41.417 ERROR [CompletionServiceHelper] [grizzly-http-server-16] Connection error . Details: org.apache.hc.core5.http.NoHttpResponseException: localhost:22000 failed to respond

~~To fix this issue, In this PR we are setting the ConnectionReuseStrategy to false to resolve all the errors.~~

On further investigation, I am unable to determine whether the failures are purely due to the above-mentioned issue or to changes in caching with connections. I need help from someone to pin down the exact reason.

While investigating, I noticed that using the HTTP 2 protocol instead of HTTP 1.1 resulted in no issues in our entire test suite. Can we accept this as a solution?

abhioncbr avatar Oct 09 '24 02:10 abhioncbr

Codecov Report

Attention: Patch coverage is 56.81818% with 19 lines in your changes missing coverage. Please review.

Project coverage is 63.83%. Comparing base (59551e4) to head (1e7cd92). Report is 1273 commits behind head on master.

Files with missing lines Patch % Lines
...e/pinot/common/utils/FileUploadDownloadClient.java 36.00% 16 Missing :warning:
...org/apache/pinot/common/utils/http/HttpClient.java 85.71% 1 Missing :warning:
.../org/apache/pinot/common/utils/http/HttpUtils.java 90.00% 0 Missing and 1 partial :warning:
...ontroller/api/resources/PinotControllerLogger.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14191      +/-   ##
============================================
+ Coverage     61.75%   63.83%   +2.08%     
- Complexity      207     1556    +1349     
============================================
  Files          2436     2661     +225     
  Lines        133233   145904   +12671     
  Branches      20636    22333    +1697     
============================================
+ Hits          82274    93138   +10864     
- Misses        44911    45885     +974     
- Partials       6048     6881     +833     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) :arrow_up:
integration 100.00% <ø> (+99.99%) :arrow_up:
integration1 100.00% <ø> (+99.99%) :arrow_up:
integration2 0.00% <ø> (ø)
java-11 63.81% <56.81%> (+2.10%) :arrow_up:
java-21 63.67% <56.81%> (+2.04%) :arrow_up:
skip-bytebuffers-false 63.83% <56.81%> (+2.08%) :arrow_up:
skip-bytebuffers-true 63.64% <56.81%> (+35.91%) :arrow_up:
temurin 63.83% <56.81%> (+2.08%) :arrow_up:
unittests 63.83% <56.81%> (+2.08%) :arrow_up:
unittests1 55.48% <32.55%> (+8.59%) :arrow_up:
unittests2 34.20% <34.09%> (+6.47%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Oct 09 '24 02:10 codecov-commenter

Thanks for debugging this!

Can you elaborate more on the root cause? Since the bug was fixed in the new version, why would our test fail? Is there extra overhead if we do not reuse connection?

Jackie-Jiang avatar Oct 10 '24 21:10 Jackie-Jiang

Thanks for debugging this!

Can you elaborate more on the root cause? Since the bug was fixed in the new version, why would our test fail? Is there extra overhead if we do not reuse connection?

IMO, connection creation is a time-consuming process, and hence, not reusing it would degrade the performance. I have reverted the changes. I am not sure if switching to HTTP2 is an acceptable solution or not.

abhioncbr avatar Oct 24 '24 13:10 abhioncbr

For reference: 5.4 release note

Jackie-Jiang avatar Oct 24 '24 18:10 Jackie-Jiang

This is the highlighted release note, not sure if related to the issue we encountered:

IMPORTANT! The new cache entry serialization format is incompatible with earlier
versions of HttpClient Cache. Persistent caches (file system based, Memcached, EhCAche
with object serialization) created with any earlier version MUST be flushed and re-populated
or the cache backend MUST be configured to use the old, deprecated cache entry serializer.

Jackie-Jiang avatar Oct 24 '24 18:10 Jackie-Jiang

This is the highlighted release note, not sure if related to the issue we encountered:

IMPORTANT! The new cache entry serialization format is incompatible with earlier
versions of HttpClient Cache. Persistent caches (file system based, Memcached, EhCAche
with object serialization) created with any earlier version MUST be flushed and re-populated
or the cache backend MUST be configured to use the old, deprecated cache entry serializer.

Yes, I am following the release notes and saw this. They made the cache change in release 5.4.2-alpha, and we are seeing these issues from that version. In my understanding, we aren't using HTTP caching, and unit test cases that are not cache-based are failing. I am unable to pinpoint the exact issue.

It seems like the HTTP 2 protocol version looks good as per our test suites, and it's more performant than HTTP 1.1, maybe we can give this solution a try?

abhioncbr avatar Oct 24 '24 22:10 abhioncbr

We will wait for http5-client 5.4.2 version, to check if the fix is there for Http1.1

abhioncbr avatar Nov 01 '24 23:11 abhioncbr