betterreads icon indicating copy to clipboard operation
betterreads copied to clipboard

Throttle request rate to one per second.

Open ConradHughes opened this issue 4 years ago • 3 comments

New Pull Request

Hi Jess, thanks for maintaining this SDK! I may be misunderstanding something, but it looks to me as if the SDK as is doesn't enforce rate limiting against Goodreads' site; this tiny patch fixes that. It seems almost unnecessary since my experiments so far take several seconds per request anyway, but this way you can do whatever you want with a clear conscience.

UPDATE: have also added a bugfix for no-results from search: resp["search"]["results"] is None when there are no results, so client.py:110 would fail with TypeError: 'NoneType' object is not subscriptable:

works = resp["search"]["results"]["work"]

UPDATE 2: another small bugfix: if a book's only on one shelf then _book_dict['popular_shelves']['shelf'] is the dict for that one shelf, not [a list containing one dict] — where for more than one shelf it will be a list; this needs special handling as a result. I suspect that not being on any shelves will be another thing altogether, but haven't come across that yet.

UPDATE 3: yep, malfunctioned if a book was on zero shelves too; fixed by latest commit.

UPDATE 4: added an optional diskcache argument to the GoodreadsClient constructor; for cleanliness this involved moving some of the rate throttling code in there too.

What is in this PR?

Select all that apply

  • [x] Bugfix

  • [x] New feature

  • [ ] Documentation improvements

  • [ ] Improved test coverage

  • [ ] Other: (Please specify)

Issue

  1. Goodreads want API users to limit requests to one per second, but the existing betterreads code doesn't seem to enforce this.
  2. A search with no results causes a failure.
  3. A book that's on only zero or one shelves returns bad data via popular_shelves.
  4. While developing a script against this API it seems wasteful and slow to keep firing the same requests at Goodreads: the option of a cache would be nice.

Abstract

  1. Solve by using ratelimit.py to limit calls to the request() method to one per second.
  2. Fix.
  3. Fix.
  4. Solve by adding optional support for such a cache.

Technical Description

  1. Very small change: just decorate request() to not be callable more than once a second, and (rather than throwing an exception if called too frequently) to sleep and retry if it is.
  2. Catch the special case and handle it.
  3. Catch the special case and handle it.
  4. Added an optional diskcache parameter to the GoodreadsClient constructor. The caller should configure it appropriately (e.g. 24h limit to comply with Goodreads Ts&Cs).

Dependency updates

Added dependency on 'ratelimit', which makes this whole job easy.

Good Citizenship

I'm not very familiar with Python so don't know how to do the first and last of these; running check returns instantly with no obvious tests run.

  • [x] I have updated any relevant tests and the test suite is passing

  • [x] I have updated any relevant documentation

  • [ ] I have run the pre-commit hooks for this repository's designated code styles

ConradHughes avatar May 14 '20 14:05 ConradHughes

Is the project alive?

scastillo avatar Aug 19 '20 22:08 scastillo

Hi Sebastian, I don't think it is live (Jess hasn't responded to my pull requests at least), but it does still work. I think the tests are failing because they need working GoodReads API keys in environment variables GOODREADS_KEY and GOODREADS_SECRET to run. When I supply these on my machine, the tests all still pass.

ConradHughes avatar Aug 27 '20 10:08 ConradHughes

Hi! I'm so sorry for the delay. I took some time off of open source stuff due to work and personal reasons. I hope to start working through backlog PRs this month.

thejessleigh avatar Oct 01 '20 20:10 thejessleigh