capella-collab-manager icon indicating copy to clipboard operation
capella-collab-manager copied to clipboard

Cache diagram cache from Github

Open Paula-Kli opened this issue 1 year ago • 1 comments

Artifacts can only be downloaded as a whole zip file at once. At the moment the zip file is downloaded completely for the index, then thrown away and dowloaded again for each single file in the diagram cache. That should be cached to minimize the requests and speed up the whole loading process of the diagram cache (possibly using redis(?))

Paula-Kli avatar Aug 02 '23 08:08 Paula-Kli

@MoritzWeber0 and I decided to implement this as follows:

  • We store the unique repository identifier (project ID on GitLab and owner/repository on GitHub) in the database git model since there is a one-to-one relationship. The current plan is to set this id the first time we need it, but in the future we may be able to set it directly during model creation. It is important to reset the repository id, i.e. set it to None, when the repository path is updated.
  • To improve performance and readability, we will split get_file_from_repository_or_artifacts into two separate functions, get_file_from_repository and get_file_from_artifacts. This should also make it easier to introduce specific handling for each type.
  • The new get_file_from_artifacts returns the job id in addition to the file content. This is used to retrieve the job id only once per (diagram cache) request. More specifically, when requesting the diagram cache metadata, we return the job id when retrieving from artifacts, which can then be used for subsequent diagram requests. In addition to reducing the number of API calls, this also ensures that diagrams are loaded from a single pipeline, because currently in the rare case that a pipeline finishes while diagrams are being fetched, newly fetched diagrams would be from the new pipeline's artifacts.
  • We introduce a content cache where we store the fetched diagrams using a combination of the git model id, repository id, and file path as a key. To ensure that we only load data from the cache when there is no new data, we use a second metadata cache where we store either the job id in the case of the diagram fetched from artifacts, or the last update time. Here we still have to decide whether to use only the git model id and repository id as keys (which should be sufficient for the artifact case, but may cause problems for the repository case), or to use the git model id, repository id, and file path as keys, or to use the first approach for artifacts and the second for repository files.
  • Investigate whether any handler-specific caching is required. For example, this would be the case for the situation in the problem description where we download the entire zip for each diagram.

dominik003 avatar Aug 12 '24 16:08 dominik003