grpc-kotlin icon indicating copy to clipboard operation
grpc-kotlin copied to clipboard

Move the initial call.request

Open taer opened this issue 4 years ago • 7 comments
trafficstars

Was doing a bit of performance testing of a streaming flow interface, and was finding the request.collect was taking quite some time from the initial invocation of the method. Up to 15ms at times. The code looks something like this:

    fun exampleStream(requests: Flow<Int>): Flow<Int> {
        val start = System.nanoTime()
        return flow<Int> {
            requests.collect { value ->
                val collected = System.nanoTime()
                val thisTime = collected-start
            }
        }
    }

The p99 of thisTime is what we're seeing vary quite dramatically. I looked at the differences between the flow version and observer version, and found the observer one made a call.request(1) in the startCall part, while the flow version only made that initial request after the call to collect.

I did a bit of attempts at perf testing of this locally, and the numbers seem about the same from the client side. In a prod environment, the thisTime is slightly better.

taer avatar Aug 12 '21 19:08 taer

CLA Signed

The committers are authorized under a signed CLA.

  • :white_check_mark: Robert Macaulay (75b5ea5d4fe067c785e752dc0010ef267481446d)

This pairs w/ my new PR https://github.com/grpc/grpc-kotlin/pull/360

There's never a server that I could think of who would desire to not collect the requests from the stream.

I'm not certain this will add any speed, but moves the timing to the grpc latyer and not the flow

taer avatar Sep 28 '22 02:09 taer

@lowasser Just checking in to see if this project is still alive or if you have any questions/concerns on this?

taer avatar Mar 10 '23 14:03 taer

I'm not @lowasser but I have the impression that the PR would take away the application's ability to control the timing of initial request, i.e. it takes away control of initial flow control from the application. That would seem a bad idea in general.

For example, a server under load might incur additional buffering cost for each connecting client without the ability to refuse the call straight away.

bubenheimer avatar Mar 10 '23 15:03 bubenheimer

That is something a server could do w/ the current implementation, but only for the streaming request style. The unary versions behave like this PR attempts.

Also, for that usecase of rejecting for load, an interceptor would be a much more likely path.

From my testing, this really didn't make much of a difference to be honest. I still had quite a large amount of time between the RPC method getting "invoked" and the first element appearing from the collect.

fun rpcMethd(reqesuts: Flow<Request>){
            val preRequestCollect = System.nanoTime()
            requests.collectIndexed { index, value ->
                val postRequestCollect = System.nanoTime()

p99 of measured under high load is about 5-6ms. That was what I was attempting to mitigate. The unary versions of the RPC all looked about 5-6ms better because they collected the request for us and "hid" that time

I might make one more pass at testing to see if there was anything noticeable. It's largely just moving chairs about to give us more visibility, since we use the said interceptor to manage load

taer avatar Mar 13 '23 14:03 taer

@taer With your change I suspect that even an interceptor will not have the chance to reject before the server requests.

I would not want to trade flow control capabilities in my streaming calls for a minimal improvement to call startup latency. That is why I am chiming in here.

I generally expect some call startup latency, and use streaming calls rather than unary to minimize overhead.

bubenheimer avatar Mar 13 '23 15:03 bubenheimer

The interceptor code runs well before any of the code in question. The interceptors get a chance to kill the request well before anything we're talking about here has a chance to matter.

All of the interceptors have run prior to any effect of this PR making a difference.

And again, the point is moot, since it's about accounting. the grpc-libs call the method. Is the first request available? In the unary scenario, the library ate the "delay", and the RPC method has the request instantly. In the streaming case, the RPC method is called prior to the data being available and is responsible for "gathering the request"

So if you wanted to kill the request via interceptor, there is absolutely no delta here. None of this code gets called if an interceptor kills the request. This is just making the unary and streaming semi-equivalent.

An opposite approach would give the unary requests a deferred instead of the request, which would also be a valid alternative, but super breaking

In essence, the problem is the unary requests spend "the time" in the library giving a false sense of speed to the unary RPC. The streaming versions cant do this.. So if you're measuring your RPC time from start to end.... A streaming call of one will ALWAYS be slower than the equivalent streaming version.

taer avatar Mar 14 '23 00:03 taer

Closing so this stops showing on my list.

taer avatar Apr 30 '24 13:04 taer