open-vsx.org Procedure for the clean up of the database

Hi John. We need a procedure on how to clean up the database from time to time to prevent the size to become uncontrollable. Can you please provide us the steps on what we can clean etc. Thanks

Aug 14 '24 20:08 chills-eclipse

See also: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4157#note_1703626

Aug 14 '24 20:08 denisroy

@chills-eclipse Yep, something we need to get back to. @amvanbaren Thoughts?

Aug 15 '24 13:08 kineticsquid

@kineticsquid You want to move forward on https://github.com/eclipse/openvsx/pull/888?

Aug 21 '24 09:08 amvanbaren

@amvanbaren MS doesn't limit extension versions. I'd like to see what other options we have.

The biggest offender is the file table. And there appear to be two API calls that could reference the table,

/api/.../logo/...
/api/.../file/...

Are there other API calls that reference the table?

I'm wondering if then consider where these API calls are coming from, we might be able to limit the size of the able. Where are calls to this API originating?

open-vsx.org UI
IDEs like Theia
npx ovsx commands like get

In this Gitlab issue, https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4797, @denisroy provided a dump of the access logs and I took a look at the use of the API

Aug 21 '24 18:08 kineticsquid

@amvanbaren Now that we've completed https://github.com/eclipse/openvsx/pull/888, I'm thinking we need a script to scrub and reduce the size of the file resources table. And, for safety's sake, should probably take a backup first. And given that, I'm thinking we need a read only mode, or at least a mode that disables publishing while we work on the table. Thoughts?

Jan 31 '25 14:01 kineticsquid

Yes, now that https://github.com/eclipse/openvsx/pull/1045 has been completed we can remove resource files from the file_resource table. This can be done through a migration similar to https://github.com/eclipse/openvsx/blob/master/server/src/main/java/org/eclipse/openvsx/migration/SetPreReleaseJobRequestHandler.java

Feb 03 '25 11:02 amvanbaren

@amvanbaren I see the code, but I'm not sure I understand how the pieces fit together. Are you saying that we run this for each extension? If so, does that remove the need for R/O mode or to pause publication?

Feb 03 '25 16:02 kineticsquid

@tfroment FYI

Feb 03 '25 16:02 kineticsquid

@kineticsquid It's an example of a migration job that runs in the background. It is for when extra processing is needed that can't be done in a SQL migration script. We can run a similar migration job for each FileResource that is of type resource where it first deletes the file in storage and then deletes the FileResource entity in the database.

Yes, there's no need for a read-only mode or to pause publication.

Feb 03 '25 21:02 amvanbaren

@amvanbaren Understood. Do we need to first change how we process published extensions to not create entries for all of these files, or have we already made that improvement?

Feb 04 '25 15:02 kineticsquid

@kineticsquid In https://github.com/eclipse/openvsx/pull/1100 resources are no longer processed and existing resources are deleted.

Feb 06 '25 10:02 amvanbaren

@amvanbaren Got it. Do we have any more work to do on this one then after https://github.com/eclipse/openvsx/pull/1100 is deployed?

Feb 06 '25 15:02 kineticsquid

Hi @amvanbaren any update on this? should we close?

Mar 26 '25 10:03 tfroment

The file_resource table should now be way smaller (migration has completed). The migration_item table can be cleaned up, but it's better to do that through a db migration.

It looks like failed JobRunr jobs are not automatically deleted from the jobrunr_jobs table. There are configuration properties for org.jobrunr.background-job-server.delete-succeeded-jobs-after and org.jobrunr.background-job-server.permanently-delete-deleted-jobs-after, but nothing for failed jobs. @chills-eclipse you can check by running SELECT id, state, createdat, scheduledat FROM jobrunr_jobs ORDER by createdat LIMIT 100;

Mar 26 '25 12:03 amvanbaren

@amvanbaren do you want the output here in the comments?

Mar 27 '25 11:03 chills-eclipse

@chills-eclipse No, that query was for your convenience.

Mar 31 '25 07:03 amvanbaren