openvsx icon indicating copy to clipboard operation
openvsx copied to clipboard

Mirror functionality

Open amvanbaren opened this issue 2 years ago • 27 comments

Fixes #505

Testing steps

  • Change the ovsx.data.mirror.schedule.extensions and ovsx.data.mirror.schedule.metadata cron expressions. Current time + 3 minutes works well for immediate testing.
  • Let the server run, the mirror jobs contain log statements to show what is happening in the background.
  • View a fully mirrored extension in the webui. The data should be the same (download count, reviews, rating, verified extension versions, published date, latest update date) even though the layout is a little different.
  • View a fully mirrored extension in VS Code. Again the data should be the same, enhanced even because of the newer features.
  • Restart the server with a different schedule.

It's also a good idea to run mirror mode for a longer time on staging to see what pops up.

amvanbaren avatar Aug 24 '22 22:08 amvanbaren

While I was testing I found a little problem, not sure if it's related to the mirror, where open-vsx would not connect to postgresql after running the sync task for a while

the log show: Apparent connection leak detected at that time, anything about sql will not work including all API and tasks, open restart pod can resolve this problem

iQQBot avatar Aug 29 '22 03:08 iQQBot

Also the current way of mirroring is very similar to republishing, it will download the full vsix file and republish it, which will make it very slow and generate a lot of unnecessary traffic, can we just sync the metadata through API?

iQQBot avatar Aug 29 '22 03:08 iQQBot

@amvanbaren some extensions are broken: Screenshot 2022-08-29 at 10 33 06

It is important that sync happens in order of publishing to bring the data in the same state, i.e. this case the extension was renamed and it broke all links.

Also the current way of mirroring is very similar to republishing, it will download the full vsix file and republish it, which will make it very slow and generate a lot of unnecessary traffic, can we just sync the metadata through API?

If there is way an easy to implement it 👍 that we don't need to download anything. If not like that it is also alright for now.

akosyakov avatar Aug 29 '22 08:08 akosyakov

@amvanbaren When via using WebUI it does not seem that results from upstream are merged, only synced extensions are poping-up.

I'm not sure that we need web UI at all. The only reason while we are keeping it because VS Code has links to marketplace, and if extension is served from local backup then they are pointing to our local installation. Could we change it to point to upstream as well for the mirror mode? Then we don't need web UI at all.

Screenshot 2022-08-29 at 10 38 13

Otherwise web UI has broken elements like Log In button. I would prefer that we don't need to host it.

akosyakov avatar Aug 29 '22 08:08 akosyakov

Also the current way of mirroring is very similar to republishing, it will download the full vsix file and republish it, which will >>make it very slow and generate a lot of unnecessary traffic, can we just sync the metadata through API?

If there is way an easy to implement it 👍 that we don't need to download anything. If not like that it is also alright for now.

I don't see an easy way to implement it right now. The main problem being that FileResources for an extension need to be mirrored too.

amvanbaren avatar Aug 29 '22 19:08 amvanbaren

@amvanbaren When via using WebUI it does not seem that results from upstream are merged, only synced extensions are poping-up.

@akosyakov Can you give an example?

amvanbaren avatar Aug 29 '22 19:08 amvanbaren

@akosyakov Can you give an example?

@amvanbaren in https://github.com/eclipse/openvsx/pull/513#issuecomment-1229961278 on the screenshot highlighted links are taking a user to the mirror not upstream.

akosyakov avatar Aug 30 '22 08:08 akosyakov

@akosyakov Can you give an example?

@amvanbaren in #513 (comment) on the screenshot highlighted links are taking a user to the mirror not upstream.

You can configure ovsx.webui.url in src/dev/resources/application.yml to point to https://open-vsx.org

ovsx:
    webui:
        url: https://open-vsx.org

amvanbaren avatar Aug 30 '22 08:08 amvanbaren

I don't see an easy way to implement it right now. The main problem being that FileResources for an extension need to be mirrored too.

@amvanbaren It is alright. We need to ensure though if there is a never version in the upstream for already mirrored extension then this new version is returned as well. Because sync can take a while and we don't want out users wait a day to see a new version.

akosyakov avatar Aug 30 '22 11:08 akosyakov

We need to ensure though if there is a never version in the upstream for already mirrored extension then this new version is returned as well. Because sync can take a while and we don't want out users wait a day to see a new version.

At first I thought we could solve this by configuring ovsx.upstream.* properties, but the RegistryAPI just returns the first match. This strategy ensures good performance and minimizes the load on upstream. However, it doesn't merge responses or check which response is fresher.

https://github.com/eclipse/openvsx/blob/81829e2f5cc93e699c820da957f1ba60e9f9c551/server/src/main/java/org/eclipse/openvsx/mirror/MirrorSitemapJobRequestHandler.java#L66-L69 The MirrorSitemapJobRequestHandler creates extension mirror job ids based on namespace, extension and lastModified. If the job id already exists in the database the job is skipped. So, if an extension hasn't been modified then JobRunr doesn't try to sync it again.

The initial sync will take a while, but after that you can probably run sync multiple times a day.

amvanbaren avatar Aug 30 '22 15:08 amvanbaren

At first I thought we could solve this by configuring ovsx.upstream.* properties, but the RegistryAPI just returns the first match. This strategy ensures good performance and minimizes the load on upstream. However, it doesn't merge responses or check which response is fresher.

Yeah, but it implies that sync should be fast and reliable. If we have bug there are not room for errors and it will affect usability. I hoped that we can drop our DB at any time if we notice that some extensions are synced bogusly and it still will be served from upstream while synching.

As far as I understand now all extension version should be synced in reverse order and it can take a while before we see proper latest?

akosyakov avatar Aug 31 '22 06:08 akosyakov

Yeah, but it implies that sync should be fast and reliable. If we have bug there are not room for errors and it will affect usability. I hoped that we can drop our DB at any time if we notice that some extensions are synced bogusly and it still will be served from upstream while synching.

You can enable ovsx.upstream functionality alongside mirror mode to serve upstream extensions while the server is syncing. The only thing is that the registry first checks the local DB, before it sends a request to upstream. Just make sure to delete the bogus extension (version), so that it returns the right one from upstream.

As far as I understand now all extension version should be synced in reverse order and it can take a while before we see proper latest?

Yes, that can take a while. Maybe the extension can be de-activated while it's syncing, so that during that time it returns the upstream extension. I'll investigate this further.

amvanbaren avatar Aug 31 '22 12:08 amvanbaren

Yes, that can take a while. Maybe the extension can be de-activated while it's syncing, so that during that time it returns the upstream extension. I'll investigate this further.

I was just trying to test different scenarios and learned that I need a way to control synching, i.e. specify which extension should be served always from upstream (no sync). I think the same way can be used to control how to filter out bogus extensions and exclude extensions which are not synced completely yet programatically?

Another thing which probably should not be part of this PR. But wen need RED metrics to upstream as well to know when we should escalate issues to Eclipse Foundation. If we measure only from clients or only requests to self-hosted installation it could look that everything is alright because of backup while upstream experiencing issues.

akosyakov avatar Sep 05 '22 08:09 akosyakov

You can configure ovsx.webui.url in src/dev/resources/application.yml to point to https://open-vsx.org/ Screenshot 2022-09-05 at 12 58 56

We configured it, but VS Code still get links to our installation

akosyakov avatar Sep 05 '22 11:09 akosyakov

Screenshot 2022-09-05 at 13 02 05

I'm trying to install vim extension while it is synching. It seems to resolve it from upstream, but it cannot be installed.

akosyakov avatar Sep 05 '22 11:09 akosyakov

I think there should be following acceptance criteria for this PR. It should be tested from VS Code without web UI. Following is expected:

  • VS Code UI should almost never fail (99%), i.e. no HTTP errors in the extensions and preview views, not exceptions while installing
  • if open-vsx.org is up then latest version of extension can be found, previewed and installed even if latest version is not synced yet
  • if http://open-vsx.org is down
    • extension is not synced at all then an extension cannot be found, previewed or installed
    • otherwise latest synced version of extension can be found, previewed and installed

akosyakov avatar Sep 05 '22 11:09 akosyakov

I was just trying to test different scenarios and learned that I need a way to control synching, i.e. specify which extension should be served always from upstream (no sync).

Ok, how do you want to be able to configure this? Through the admin panel?

Another thing which probably should not be part of this PR. But wen need RED metrics to upstream as well to know when we should escalate issues to Eclipse Foundation. If we measure only from clients or only requests to self-hosted installation it could look that everything is alright because of backup while upstream experiencing issues.

I've added it to #514

amvanbaren avatar Sep 05 '22 11:09 amvanbaren

Ok, how do you want to be able to configure this? Through the admin panel?

I'm not sure about it. Mirror mode does not have Admin UI? We thought to disable web UI all together. Maybe via dynamic configmap? cc @iQQBot wdyt?

akosyakov avatar Sep 05 '22 13:09 akosyakov

@akosyakov Should a namespace also be de-activated when an extension that is part of the namespace is syncing?

amvanbaren avatar Sep 09 '22 08:09 amvanbaren

@akosyakov Should a namespace also be de-activated when an extension that is part of the namespace is syncing?

I don't think so. If namespace has other extensions already synced, they should be available.

akosyakov avatar Sep 09 '22 08:09 akosyakov

@akosyakov I've added job chaining, so that jobs are executed in the right order. To accomplish this I switched to Quartz scheduler. Extensions are now de-activated before syncing starts and re-activated when they're done syncing.

amvanbaren avatar Sep 12 '22 16:09 amvanbaren

You can configure ovsx.webui.url in src/dev/resources/application.yml to point to https://open-vsx.org/

Screenshot 2022-09-05 at 12 58 56

We configured it, but VS Code still get links to our installation

Yes, when you click the link it redirects to open-vsx.org

amvanbaren avatar Sep 13 '22 13:09 amvanbaren

Ok, how do you want to be able to configure this? Through the admin panel?

@amvanbaren There is not yet wait to control it? Maybe it could be a part of application.yaml or some file which is monitored by server to allow hot deployment of such configuration.

akosyakov avatar Sep 14 '22 09:09 akosyakov

Ok, how do you want to be able to configure this? Through the admin panel?

@amvanbaren There is not yet wait to control it? Maybe it could be a part of application.yaml or some file which is monitored by server to allow hot deployment of such configuration.

@akosyakov I'm looking into embedding Spring cloud config. It allows you to define properties in a remote location (Git repo). The config server picks up any changes and applies them to the Open VSX server.

amvanbaren avatar Sep 14 '22 09:09 amvanbaren

@akosyakov I'm looking into embedding Spring cloud config. It allows you to define properties in a remote location (Git repo). The config server picks up any changes and applies them to the Open VSX server.

@iQQBot What do you think about it in context of our setup? We will put it under private repo?

@amvanbaren Can we configure it to pick up from the file on the same machine as well? Usually we are using k8s configmaps to deploy such configurations.

akosyakov avatar Sep 14 '22 12:09 akosyakov

@amvanbaren Can we configure it to pick up from the file on the same machine as well? Usually we are using k8s configmaps to deploy such configurations.

Yes, that's possible too and wouldn't require any changes to the server. https://developers.redhat.com/blog/2017/10/03/configuring-spring-boot-kubernetes-configmap#setup

amvanbaren avatar Sep 14 '22 13:09 amvanbaren

@iQQBot What do you think about it in context of our setup? We will put it under private repo?

I would like configure it to pick up from the file on the same machine as well

iQQBot avatar Sep 14 '22 13:09 iQQBot

I will close it in favor of https://github.com/eclipse/openvsx/pull/586. We are doing final testing on our side, and then will squash again, remove unnecessary changes and update the PR description.

akosyakov avatar Nov 08 '22 09:11 akosyakov