gapic-generator-python icon indicating copy to clipboard operation
gapic-generator-python copied to clipboard

determine whether default_timeout=None is reasonable

Open vchudnov-g opened this issue 1 year ago • 3 comments

In the course of investigating a client issue, I discovered that the transports/base.py wraps some methods with default_timeout=None, as reflected in the golden files here.

Is this reasonable? If the user does not explicitly specify a timeout when invoking the method, the default_timeout value gets passed all the way through the Python requests library and, as currently set, would not time out RPC calls. Should we change the default?

In the current customer issue I'm investigating (Googlers: b/318069394), for some other reason (still unclear) the server appears to be taking too long to respond, but the client's request method only yields control (by throwing an exception) when the connection is finally dropped, not earlier.

[We should also check that retries are enabled correctly with reasonable values.]

vchudnov-g avatar Jan 03 '24 23:01 vchudnov-g

Marking this as P2 because there does appear to be a workaround: explicitly setting the kwarg timeout=N where N is the number of seconds when invoking the GAPIC method for the RPC. That appears to be propagated to the underlying requests library (checked by code inspection, not by running it).

vchudnov-g avatar Jan 03 '24 23:01 vchudnov-g

We need to be careful with this, as retrying on client-side timeout is a kind of "request hedging" as far as the backend is concerned. In BigQuery, we had troubles (https://github.com/googleapis/python-bigquery/issues/970) where our default client-side timeout was a full minute+ beyond what was documented as the server-side request timeout, but some operations that used to be possible became impossible (I believe due to differences in when that server-side timeout clock starts and the client-side request timeout).

If we were to do this, it'd be better to do proper hedging where we start another request in parallel but keep the original request open for a bit to avoid that kind of problem.

tswast avatar Jan 19 '24 16:01 tswast

This is still on our backlog to investigate.

vchudnov-g avatar May 03 '24 22:05 vchudnov-g