catma icon indicating copy to clipboard operation
catma copied to clipboard

[EPIC] Performance problems

Open maltem-za opened this issue 3 years ago • 1 comments

A number of factors are known to contribute to the performance problems that many users are experiencing:

  • First and foremost, the potentially high number of individual repositories (submodules) per project. This number can grow quickly in collaborative projects that have multiple documents, especially where each member has their own annotation collection per document (eg: 10 documents * 10 members is already 100 separate annotation collection repositories).
  • No. of projects per account - this is most likely related to the way that we are currently fetching the resource permissions from GitLab, but there may be other causes.
  • No. of annotations in a project, as these are all loaded into the relevant data structures and the graph DB when a project is opened. Documents and tags also matter, but generally occur in relatively low numbers by comparison.

The high number of individual elements also causes migrations of larger projects from CATMA 5 to be very slow, or fail completely.

Options that have been discussed / to consider:

  • Drastically reducing the number of individual repositories/submodules (we have already agreed that we definitely want to do this). There are different ways that we could achieve this and which we should evaluate carefully. Among them: making heavier use of branches (and potentially protected branches) and/or GitLab subgroups. Whichever path we ultimately choose, it will likely have a significant impact on the roles and permissions system in CATMA, with some reduction in flexibility for added performance.
  • Relating to how resource permissions are fetched, this will depend largely on the changes resulting from the previous point. We should also review what has changed in the GitLab API that we could potentially benefit from.
  • Relating to reduced permissions flexibility, concentrating more on making it easy for users to undo certain things, rather than preventing them from happening, might be better and more intuitive anyway.
  • More lazy-loading
  • Re-evaluating the pros & cons of storing tags and annotations in individual files. Part of the reason for doing this was to reduce the likelihood of merge conflicts, however we could potentially save ourselves a lot of effort and complexity by trying to offload conflict resolution entirely onto the existing GitLab mechanisms (branches & merge requests) and educating users on the use of these. There are also other ways that we could prevent tricky conflict scenarios, such as not allowing multiple users to write into the same annotation collection.

This issue is marked EPIC as it should be broken down into concrete tasks once we have decided on a way forward. Please link related issues below.

maltem-za avatar Sep 09 '21 10:09 maltem-za

Issues #303, #304, #305 and #306 are meant to address the performance problems described above.

mpetris avatar Jan 19 '22 16:01 mpetris

Completed with the release of 7.0.0

See child issues #303, #304, #305 and #306 for further details.

maltem-za avatar Jun 01 '23 09:06 maltem-za