pants icon indicating copy to clipboard operation
pants copied to clipboard

[internal] add network-level gRPC timeouts

Open tdyas opened this issue 2 years ago • 7 comments

Add network level gRPC timeouts starting with the check_action_cache function (which maps to the REAPI GetActionResult API).

Fixes #16188.

tdyas avatar Jul 15 '22 20:07 tdyas

This PR currently vendors the Tower TimeoutLayer in order to allow the addition of a timeout metric. I am not enamored with this approach. I am going to experiment with writing a custom layer that can detect the timeout condition from Tower's TimeoutLayer (by sitting above it in the stack) and just emit the metric that way. Thoughts?

tdyas avatar Jul 15 '22 20:07 tdyas

(Forgot to mark this for review so I could ask the question in earlier comment. Doing so now. Doh.)

tdyas avatar Jul 21 '22 00:07 tdyas

Thanks!

This PR currently vendors the Tower TimeoutLayer in order to allow the addition of a timeout metric.

I'm not following why we need to vendor, rather than importing the code?

Either way, should probably add comments that this is vendored, and what source it comes from.

Eric-Arellano avatar Jul 21 '22 15:07 Eric-Arellano

I'm not following why we need to vendor, rather than importing the code?

Just look at the Pants-specific code in the file that accesses workunit store ...

tdyas avatar Jul 21 '22 15:07 tdyas

... which relates to my comment at https://github.com/pantsbuild/pants/pull/16196#issuecomment-1185897383 about finding a different way to be able to increment that counter.

tdyas avatar Jul 21 '22 15:07 tdyas

ping

tdyas avatar Jul 28 '22 17:07 tdyas

Latest commit ditches the vendored timeout layer and just uses a layer to count errors (which will be timeouts) coming from the Tower TimeoutLayer.

tdyas avatar Jul 28 '22 21:07 tdyas

Awesome, thank you!

Eric-Arellano avatar Dec 09 '22 04:12 Eric-Arellano