yarpc-go
yarpc-go copied to clipboard
inbound decode path: investigate high memory usage
Recently, I was investigating a high memory issue in production and came across the attached heap profile on some of the hosts. The host wasn't doing much other than handling about 100-200 requests / second. But it was consuming about 4-5G of memory and increasing.
Without digging deeper, the root cause seems to be because of holding onto the decode buffer for the duration of the rpc call https://github.com/yarpc/yarpc-go/blob/ff8d19f1accd1b94e829b698d205a7258f82ddc9/encoding/thrift/inbound.go#L50
Based on initial discussion with Prashant, the fix probably isn't simple because of lazy decoding of thrift payload. Creating this issue for further investigation.
Took a look at this with Prashant and Abhinav offline, this would be pretty complicated to fix, unfortunately. Thrift does lazy decoding, as you mentioned, and there's no way to signal from the encoding layer, back up to the transport layer that encoding layer is "finished" with the bytes.
A super hacky solution would be to pass a "finisher" function from the TChannel layer, onto the context, and pull out that function in the encoding layer. This is the only way to expose more information to the encoding layer unfortunately, due to this interface.
This would take quite some effort to fix and verify, which we don't have the cycles for right now (as we focus on gRPC/Protobuf general availability). I'm willing to leave this ticket open however, in case this issues comes up with more services and we end up having more cycles to make a proper fix.
Alright, let's keep this open in that case - if this ever becomes high priority - let's get back to this
@AlexAPB Hi, this issue is happening again on our production cluster, causing our frontend host to periodically restart due to OOM. profile007.pdf
Is there any plan to fix this issue soon?
Referred to internally track for now. I've asked @wxing1292 to file a ticket.
Discussed off-ticket. This shows about 100 RPS with 2 minute deadlines. There are intermediate buffers held between TChannel and Thrift. TChannel retains at least one 64KB frame buffer regardless of payload size, which it will release when it’s closed. ThriftRW could possibly be induced to release that buffer earlier, and keep only its own copy of any of the data structures like lists that it reads lazily.
However, for now, we are not going to staff a solution to optimize for the case of long polling. Contributions to optimize TChannel and ThriftRW are welcome.