backend
backend copied to clipboard
Optimize `topics/list`
The frontend calls topics/list
until the list of topics is exhausted and then checks to see if any topics are running or in queue in order to inform non-admin users whether they can create topics (non-admins can only create one topic at a time). This can take a long time, resulting in a timeout.
Are there optimizations that can be made to this call? One idea is that perhaps it would be more efficient to filter via a param so that only topics that matched "in progress" or "running" are returned.
Any ideas?
As always with issues like this, would you be able to post a specific API call, complete with the host that you're calling (is it the frontend cache or the backend directly), arguments, limits, expected result (e.g. "should return in x / in x s") and actual result ("doesn't return at all") to make it easier for us to look into it?
Alternatively, a link for us to click on to observe the call's behavior would be tremendously useful and speed up debugging.
More details would potentially reduce the number of "dunno, works for me" responses from us :)
Here's the code on the frontend that's in questions, https://github.com/mediacloud/web-tools/blob/45422d7be1f5e766fe0f865982c70014932451ed/server/views/topics/topiclist.py#L69. What's happening is that the topics/list
is called in an attempt to find if a user has a "running" or a "queued" topic.
So which user (auth_users_id
) is it slow for specifically?
If I read it correctly, does_user_have_a_running_topic()
fetches all (user's?) topics, filters them afterwards and returns the list (despite the name which would suggest that the function returns only a boolean). Currently we have 4056 topics which normally wouldn't be that much but then the code that fetches the whole list does run a bunch of other things as well for all 4056 or so topics, thus the slowness.
A natural solution would be to add some sort of a filtering capability on the backend, for example, one should be able to do topics/list?state=running
and get only the running topics. As for a more immediate hacky fixes (not so much of a fix, rather a possibility to slightly improve performance), you can call topicList()
with a bigger limit
(10000
) to limit the SQL queries made on the backend while fetching the list of topics.
Also maybe it's worth it reviewing what does_user_have_a_running_topic()
does and if one needs it at all because in case the user has admin privileges, the function returns an empty list (instead of a list of all 4056 topics perhaps?), but I don't know enough JavaScript to backtrack what is it that it does with this (empty?) list.