backend icon indicating copy to clipboard operation
backend copied to clipboard

Optimize `topics/list`

Open dsjen opened this issue 3 years ago • 3 comments

The frontend calls topics/list until the list of topics is exhausted and then checks to see if any topics are running or in queue in order to inform non-admin users whether they can create topics (non-admins can only create one topic at a time). This can take a long time, resulting in a timeout.

Are there optimizations that can be made to this call? One idea is that perhaps it would be more efficient to filter via a param so that only topics that matched "in progress" or "running" are returned.

Any ideas?

dsjen avatar Mar 08 '21 19:03 dsjen

As always with issues like this, would you be able to post a specific API call, complete with the host that you're calling (is it the frontend cache or the backend directly), arguments, limits, expected result (e.g. "should return in x / in x s") and actual result ("doesn't return at all") to make it easier for us to look into it?

Alternatively, a link for us to click on to observe the call's behavior would be tremendously useful and speed up debugging.

More details would potentially reduce the number of "dunno, works for me" responses from us :)

pypt avatar Mar 11 '21 20:03 pypt

Here's the code on the frontend that's in questions, https://github.com/mediacloud/web-tools/blob/45422d7be1f5e766fe0f865982c70014932451ed/server/views/topics/topiclist.py#L69. What's happening is that the topics/list is called in an attempt to find if a user has a "running" or a "queued" topic.

dsjen avatar Apr 12 '21 20:04 dsjen

So which user (auth_users_id) is it slow for specifically?

If I read it correctly, does_user_have_a_running_topic() fetches all (user's?) topics, filters them afterwards and returns the list (despite the name which would suggest that the function returns only a boolean). Currently we have 4056 topics which normally wouldn't be that much but then the code that fetches the whole list does run a bunch of other things as well for all 4056 or so topics, thus the slowness.

A natural solution would be to add some sort of a filtering capability on the backend, for example, one should be able to do topics/list?state=running and get only the running topics. As for a more immediate hacky fixes (not so much of a fix, rather a possibility to slightly improve performance), you can call topicList() with a bigger limit (10000) to limit the SQL queries made on the backend while fetching the list of topics.

Also maybe it's worth it reviewing what does_user_have_a_running_topic() does and if one needs it at all because in case the user has admin privileges, the function returns an empty list (instead of a list of all 4056 topics perhaps?), but I don't know enough JavaScript to backtrack what is it that it does with this (empty?) list.

pypt avatar Apr 13 '21 12:04 pypt