tasking-manager
tasking-manager copied to clipboard
Review Task Submission and Validation api query performance
Users on slack have noticed that there can be some latency when trying to submit a task as complete or validated. Last week we reduced the instance tier of the database service- it is still performing well from a CPU/Mem standpoint but these issues were not noticable before. From an infrastructure side we can increase some postgresql server parameters (on RDS some parameters are dependent on current instance memory), but we should also examine the queries behind these api calls. In particular, DB Insights noticed these long-running queries:
UPDATE users SET tasks_mapped=? WHERE users.id = ?;
UPDATE users SET last_validation_date=?::timestamp WHERE users.id = ?;
SELECT count(*) AS count_1 FROM tasks WHERE tasks.task_status IN (?, ?);
From New Relic, about 60% of the slow database transactions are coming from backend.api.tasks.actions:tasksactionsmappingunlockapi
I was curious so I took a little dive into this. I do see some optimization opportunities in the codes that update the stats (which seems to be correlated to the long running queries and originate from tasksactionsmappingunlockapi
).
In unlock_task_after_mapping
, we call StatsService.update_stats_after_task_state_change()
. This function takes in a project_id so that it can fetch an entire Project object; this includes fetching the geometry of the project when all the function ends up doing is updating a few stats for the project. I think this also aligns with a comment you made in slack @dakotabenjamin about the project in question having a complex geometry.
The Project object has always been a speed bump (literally) due to the sheer size of the object. Perhaps it's time to take look at Project again and see where we can clean some things up. For this issue, it may be best to break the stats columns out of the projects
table and into its own table (project_stats
maybe). Task unlocking is arguably (but I'd put money on it) one of if not the most common action in the TM. There's no real reason to be fetching an entire Project object just to update stats. Overall, I think the projects
table can be simplified and we can just handle everything with existing or new DTO's that may just need to query an extra table. But for most cases (especially task actions) it should improve performance as well as simplify the codebase. Like I said, the Project object has always been a bit bloated IMO.
I hope this helps and isn't just noisy ramblings 😃 . Also, this was the most glaring thing I noticed when I followed the call stack. I'm not claiming this is the sole reason for potential latency issues and there are likely other optimizations elsewhere in the code.
I am also getting messages about commenting and invalidating tasks:
Loading back to normal now, not by uploading after validation with or without a comment but this is happening in 1 on 3 cases for weeks now. Waiting doesn't help. Reload TM helps sometimes but of course than you have to rewrite the comment.
few days back (10th of May 2022 around 11PM CEST) I wasn't able to upload task as validated. When I clicked "Submit task" it was loading for half a second and then nothing happened. I tried few times with the same result. Then I tooked screenshot of my console and refreshed the page. It logged me off so I logged back in, get back to validating page, pressed "Submit task" and all worked.
Not a big issue, not sure if it is related to this. My shot would be that it randomly logged me off while I was working on the task.
.
Also related to #2003