capella-collab-manager
capella-collab-manager copied to clipboard
Cache diagram cache from Github
Artifacts can only be downloaded as a whole zip file at once. At the moment the zip file is downloaded completely for the index, then thrown away and dowloaded again for each single file in the diagram cache. That should be cached to minimize the requests and speed up the whole loading process of the diagram cache (possibly using redis(?))
@MoritzWeber0 and I decided to implement this as follows:
- We store the unique repository identifier (project ID on GitLab and
owner/repository
on GitHub) in the database git model since there is a one-to-one relationship. The current plan is to set this id the first time we need it, but in the future we may be able to set it directly during model creation. It is important to reset the repository id, i.e. set it toNone
, when the repository path is updated. - To improve performance and readability, we will split
get_file_from_repository_or_artifacts
into two separate functions,get_file_from_repository
andget_file_from_artifacts
. This should also make it easier to introduce specific handling for each type. - The new
get_file_from_artifacts
returns the job id in addition to the file content. This is used to retrieve the job id only once per (diagram cache) request. More specifically, when requesting the diagram cache metadata, we return the job id when retrieving from artifacts, which can then be used for subsequent diagram requests. In addition to reducing the number of API calls, this also ensures that diagrams are loaded from a single pipeline, because currently in the rare case that a pipeline finishes while diagrams are being fetched, newly fetched diagrams would be from the new pipeline's artifacts. - We introduce a content cache where we store the fetched diagrams using a combination of the git model id, repository id, and file path as a key. To ensure that we only load data from the cache when there is no new data, we use a second metadata cache where we store either the job id in the case of the diagram fetched from artifacts, or the last update time. Here we still have to decide whether to use only the git model id and repository id as keys (which should be sufficient for the artifact case, but may cause problems for the repository case), or to use the git model id, repository id, and file path as keys, or to use the first approach for artifacts and the second for repository files.
- Investigate whether any handler-specific caching is required. For example, this would be the case for the situation in the problem description where we download the entire zip for each diagram.