cvat icon indicating copy to clipboard operation
cvat copied to clipboard

CVAT Backend slowing down as number of tasks in project increases

Open stykm opened this issue 2 years ago • 5 comments

My actions before raising this issue

  • [x] Read/searched the docs
  • [x] Searched past issues

Steps to Reproduce (for bugs)

  1. Run the CVAT instance in version 2.5.2 (or 2.6.0, those two we confirmed have this issue).
  2. Upload 1 task to example project and check the response time for changing the assignee (in my case ~480 ms)
  3. Upload 100+ tasks to the same example project and recheck the response time (in my case ~960 ms with 100 video tasks)

Expected Behaviour

The speed of changing the assignee/deleting tasks in the same project should not change (or increase slightly).

Current Behaviour

With a few thousand videos in the CVAT instance (in the same project), we are at about 17 seconds to change the assignee for a single task. This slowdown is also visible when we want to delete the task.

Possible Solution

We believe that the possible issue lies in the way tasks are loaded when they are in projects - namely, when you load a single task, all tasks from this project are loaded too.

Context

We used CVAT version 2.4.3 for a couple of months. All API endpoints were working as intended, snappy, and without issues. After upgrading to v2.5.2 we noticed that using the PATCH method on tasks to change the assignee was significantly slower - to the point where our GKE load balancer and CVAT Nginx were reaching timeouts of 30 seconds. This also applies to the DELETE method for deleting tasks. Since we noticed that there were some improvements in version 2.6.0, we decided to try it out but it did not change anything. We found this issue which suggested that increasing NUMPROCS env var will speed things up and while indeed we noticed slightly faster thumbnails loading, it did not change anything for API calls issues mentioned above. I tried to replicate it using docker-compose locally - ~~there was no such issue whatsoever.~~ I managed to create 200 tasks and got the slowdown in response (~480 ms vs ~780 ms). What is worth noting is that we are using external PostgreSQL DB hosted in Google Cloud SQL.

Your Environment

  • Are you using Docker Swarm or Kubernetes?
    • Kubernetes (GKE)
  • Operating System and version (e.g. Linux, Windows, MacOS):
    • Linux (Ubuntu 20.04)

stykm avatar Aug 24 '23 15:08 stykm

UPDATE:

While trying with a new project, we created about 300 video tasks and started getting 502 from API while trying to create more tasks. Seems like the POST method for creating tasks is affected too.

stykm avatar Aug 29 '23 08:08 stykm

Hi, just want to ask, when you did the server migration, was the new version compatable with the old one? That is is it possible to attach say a cvat 2.4.3 and 2.5.2 onto the same db and redis? @stykm

Lee4396 avatar Aug 29 '23 09:08 Lee4396

@Lee4396 apart from the issues mentioned above - the upgrade was pretty much plug and play (both db and redis worked out of the box)

stykm avatar Aug 29 '23 10:08 stykm

Hello,

I'm bumping this issue as I tested version 2.11.0 and the problem is still visible.

stykm avatar Mar 06 '24 14:03 stykm

I seem to be seeing this same issue. This also affects the annotation PATCH endpoint causing saves to take an extraordinary amount of time. I'm currently seeing response times of 15-30 seconds on this endpoint with a few thousand jobs.

alexyao2015 avatar Jun 18 '24 06:06 alexyao2015