openlibrary icon indicating copy to clipboard operation
openlibrary copied to clipboard

Cache get_avatar_url on /followers page to avoid fetching every user full object

Open mekarpeles opened this issue 1 year ago • 6 comments

Currently, the My Followers & Following pages are making individual requests for every patron account on the page in order to fetch the internetarchive identifier required to craft their avatar url. Ideally, we'd be able to fetch these in bulk. At minimum, we should cache these values so they are more easily accessible.

https://openlibrary.org/people/mekBot/followers https://openlibrary.org/people/mekBot/following

Screenshot 2024-05-16 at 6 25 34 AM

See: https://github.com/internetarchive/openlibrary/pull/8607/files#r1496013460

mekarpeles avatar May 16 '24 13:05 mekarpeles

@mekarpeles I don't want to scope creep this. But I've also talked with @cdrini about a bulk api for the works merge page as well. We need bulk ratings/bookshelves/lists/editions (so we can make fewer requests when doing a big merge). Making so many requests now currently seems to cause ratelimiting problems for librarians and puts them in the "slow lane".

Perhaps a more general solution should be considered when approaching this.

RayBB avatar May 16 '24 16:05 RayBB

@RayBB We already have ways of e.g. using ctx fetch_many, this is a specific case where we just need to change the code powering this API to make a single call for a batch of objects rather than a handful of individual calls.

mekarpeles avatar May 18 '24 16:05 mekarpeles

@RayBB could you please assign this to me?

Vidyap23 avatar May 22 '24 15:05 Vidyap23

@Vidyap23 you're assigned. If you have questions be sure to tag Mek as he has the context on this one

RayBB avatar May 22 '24 15:05 RayBB

@mekarpeles I am currently working on this and after going through a few files where values are being cached, I am using memcache_memoize to cache the response. Test code below

def GET(self, username):
        def fetch_cached_avatar(username):
            print(username, "inside function avatar", flush= True)
            url = 'https://picsum.photos/id/237/200/300'
            # url = User.get_avatar_url(username)
            return url
        url = cache.memcache_memoize(
        fetch_cached_avatar,
        'user.avatar',
        timeout= 5 * dateutil.HOUR_SECS,
        )(username)
        raise web.seeother(url)

Is this the right approach since I am pretty new to the codebase and I am not sure how caching works with respect to singular requests in this backend. I was able to test this by adding followers and it looks like it was working when I disabled the browser cache too. Is there any other way to test this?

Vidyap23 avatar May 31 '24 02:05 Vidyap23

@Vidyap23 Yes! This is a reasonable approach.

mekarpeles avatar Jun 10 '24 18:06 mekarpeles

@Vidyap23 did you have any other questions about this one?

mekarpeles avatar Jul 12 '24 02:07 mekarpeles