grpc-go icon indicating copy to clipboard operation
grpc-go copied to clipboard

mem: ReadAll for more efficient `io.Reader` consumption

Open ash2k opened this issue 1 year ago • 1 comments

I moved my project to gRPC 1.66.2 and saw a good reduction in RAM consumption. Now the hot spot is decompress(), where io.Copy() allocates a temporary buffer, reads from the reader into it, copies the read data into another buffer it got from the pool. This is an unnecessary allocation, an unnecessary copy, and underutilized buffers from the pool.

This PR adds mem.ReadAll() (like io.ReadAll()) to efficiently consume a reader into buffers from the pool.

Screenshot 2024-09-19 at 8 11 15 AM

I found https://github.com/grpc/grpc-go/issues/7631 while working on this code (I have similar code in my project, but decided to contribute it upstream and replace this io.Copy with it).

ash2k avatar Sep 20 '24 05:09 ash2k

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 81.86%. Comparing base (a3a8657) to head (eef8fb4). Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7653      +/-   ##
==========================================
- Coverage   81.87%   81.86%   -0.02%     
==========================================
  Files         373      373              
  Lines       37822    37880      +58     
==========================================
+ Hits        30967    31009      +42     
- Misses       5563     5579      +16     
  Partials     1292     1292              
Files with missing lines Coverage Δ
mem/buffer_slice.go 96.26% <100.00%> (+1.37%) :arrow_up:
rpc_util.go 79.48% <100.00%> (-0.06%) :arrow_down:

... and 30 files with indirect coverage changes

codecov[bot] avatar Sep 20 '24 05:09 codecov[bot]

@ash2k are you working on this actively?

purnesh42H avatar Oct 22 '24 09:10 purnesh42H

@purnesh42H No, I'm not. I think I've answered the questions and it's ready to be merged. Is something off still? Let me know what needs to be changed.

ash2k avatar Oct 22 '24 11:10 ash2k

@PapaCharlie PTAL

ash2k avatar Oct 28 '24 02:10 ash2k

There are still a whole lot of comments which are not wrapped at 80-cols. Could you please take care of that.

Also, please don't mark comments as resolved. It is the responsibility of the person making the comment to mark it as resolved when they think that the comment has been sufficiently addressed.

easwars avatar Oct 29 '24 22:10 easwars

@easwars

There are still a whole lot of comments which are not wrapped at 80-cols. Could you please take care of that.

Wrapped. Let me know if I missed something.

Also, please don't mark comments as resolved. It is the responsibility of the person making the comment to mark it as resolved when they think that the comment has been sufficiently addressed.

Ok, fair point. I used that as a way to track what I have addressed, but I see why that's not the best idea.

ash2k avatar Oct 29 '24 22:10 ash2k

Related question: I see quite a few calls with a nil pool - mem.NewBuffer(&someDataSlice, nil). Why not swap this with a type cast like this mem.SliceBuffer(someDataSlice)? No need to call a function.

ash2k avatar Oct 29 '24 23:10 ash2k

Related question: I see quite a few calls with a nil pool - mem.NewBuffer(&someDataSlice, nil). Why not swap this with a type cast like this mem.SliceBuffer(someDataSlice)? No need to call a function.

IIRC, the SliceBuffer type was added a little later on during the review process on the PR where all the buffering functionality was added, and there is a good chance some callsites were not fixed. I don't see a reason why we should be opposed to doing that change and would happy review a PR with that change. Thanks.

easwars avatar Oct 30 '24 17:10 easwars

@ash2k Could we have some benchmarks like what we have on https://github.com/grpc/grpc-go/pull/7786.

easwars avatar Nov 08 '24 20:11 easwars

@easwars

This is current master vs this branch (just rebased on master to compare apples vs apples) with the benchmark from #7786.

goos: darwin
goarch: arm64
pkg: google.golang.org/grpc
                                              │   ./old.txt   │               ./new.txt               │
                                              │    sec/op     │    sec/op     vs base                 │
RPCCompressor/comp=gzip,payloadSize=1024-10      170.9µ ± ∞ ¹   148.7µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10     205.0µ ± ∞ ¹   179.2µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10    1.515m ± ∞ ¹   1.538m ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10     102.57µ ± ∞ ¹   76.36µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10    111.73µ ± ∞ ¹   84.42µ ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10    431.3µ ± ∞ ¹   413.9µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                          253.0µ         218.7µ        -13.56%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                              │   ./old.txt    │               ./new.txt                │
                                              │      B/op      │     B/op       vs base                 │
RPCCompressor/comp=gzip,payloadSize=1024-10     146.96Ki ± ∞ ¹   25.98Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10    185.43Ki ± ∞ ¹   43.04Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10   1103.0Ki ± ∞ ¹   994.4Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10      78.25Ki ± ∞ ¹   14.00Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10     89.34Ki ± ∞ ¹   23.77Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10   1137.5Ki ± ∞ ¹   986.3Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                                          249.1Ki         84.54Ki        -66.07%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                              │  ./old.txt  │              ./new.txt              │
                                              │  allocs/op  │  allocs/op   vs base                │
RPCCompressor/comp=gzip,payloadSize=1024-10     252.0 ± ∞ ¹   244.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=10240-10    253.0 ± ∞ ¹   244.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=gzip,payloadSize=512000-10   294.0 ± ∞ ¹   288.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=1024-10     231.0 ± ∞ ¹   223.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=10240-10    231.0 ± ∞ ¹   223.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
RPCCompressor/comp=noop,payloadSize=512000-10   294.0 ± ∞ ¹   279.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                                         257.9         248.9        -3.47%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

ash2k avatar Nov 09 '24 01:11 ash2k

Thanks for the PR! That's a nice performance gain.

dfawley avatar Nov 12 '24 00:11 dfawley

@dfawley Thanks for the review and merging.

I'll wait for the release and post new RAM usage once I get this deployed.

ash2k avatar Nov 12 '24 00:11 ash2k