terrasnek
terrasnek copied to clipboard
Add support for retries with exponential backoff when encountering certain responses (429s, etc)
We are running into some 429s while using teams.list_all
, would you be open to adding retry with exponential backoff support to terrasnek?
I assume we are hitting 429s because _list_all
(https://github.com/dahlke/terrasnek/blob/master/terrasnek/endpoint.py#L374-L383) code doesn't have a sleep between the self._list
calls so it is sending the API requests as fast as it can. That would be fine as long as retries with exponential backoff were enabled.
Also thank you for creating terrasnek, it is great!
👋 @frielp . It looks like Requests through urllib3
supports adaptive retries
A question for the project is since any endpoint is subject to this request limit, does it make sense only create a Retry
object for the Requests.get
call behind list
, to add to all calls of requests.[method]
, or reorganize around a requests.Session
object so as to only set this value, and others in the future, only once?
I would lean towards a requests.Session
object as it also would add connection pooling and re-use TCP connections which should give a nice performance increase when making multiple calls back to back like is done for the list_all
functionality.
I think if this gets added it would also be nice to expose the ability to retry and some of the basic options like number of retries, amount of backoff, etc to the user as well.
I think this is a great idea. I'm gonna be too busy in personal life til around Christmas to build out and test this for you all, but if you code it up and submit a PR that broadly supports endpoints and is tested, I'll review and approve. If you don't want to do this, I will do it in the new year most likely.
Edit: I can add cheap shims in the short term as well, but I suspect we'd all prefer the route I suggested above.
@dahlke I'm happy to try. Any guidance on the broad direction of unit tests? Would you be open to bringing Pytest into the project? My rough idea so far:
https://gist.github.com/evilensky/4cc7959231aa48cb410429fdfb24b605
- pick an end point like
workspaces
and/orworkspace_vars
and/orteam
(etc) - record the expected calls.
- re-factor endpoint.py around
requests.Session
- see if assertions still pass.
I'm not sure I can unit test every single operation across all classes but 2 consumers of endpoint
would be enough confidence? One thing I'd like to avoid is mocking TFC endpoint responses; it's something I used to commonly do because having a full test felt so nice, but this post was very informative.
Thanks for the great work here!
Just a heads up, that I am working on the above as described in a branch with several enhancements, somewhat as described above, here with a small focus on typing enhancements.
Building on top of that, a Session()
is a drop in replacement and will just need some parameters exposed to configure the backoff.
I am not currently open to adding an additional testing dependency in Pytest as this runs orthogonal to all existing testing in the project. Just FYI. I'm happy to consider this longer term, but do not want to have a massive churn of the testing codebase at this time. Feel free to submit what you plan to submit and I'll see how I can work it into the current testing, but I just want to be straightforward that I do not want to make large changes around the entire project to accommodate what is a pretty simple ask (in terms of backing off retries).
While I was looking through the current code in https://github.com/dahlke/terrasnek/blob/master/terrasnek/endpoint.py I also noticed that a timeout is not currently being set when the calls are sent using requests. By default requests won't set a timeout at all which can lead to code hanging indefinitely waiting for a response (https://requests.readthedocs.io/en/latest/user/advanced/#timeouts).
@evilensky / @dahlke since the connection code is already being re-worked in this issue thoughts on adding timeout support in this work as well? Or would you all rather I open a separate issue for that so it can be done after this re-work?