Procedure for the clean up of the database
Hi John. We need a procedure on how to clean up the database from time to time to prevent the size to become uncontrollable. Can you please provide us the steps on what we can clean etc. Thanks
See also: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4157#note_1703626
@chills-eclipse Yep, something we need to get back to. @amvanbaren Thoughts?
@kineticsquid You want to move forward on https://github.com/eclipse/openvsx/pull/888?
@amvanbaren MS doesn't limit extension versions. I'd like to see what other options we have.
The biggest offender is the file table. And there appear to be two API calls that could reference the table,
/api/.../logo/.../api/.../file/...
Are there other API calls that reference the table?
I'm wondering if then consider where these API calls are coming from, we might be able to limit the size of the able. Where are calls to this API originating?
- open-vsx.org UI
- IDEs like Theia
npx ovsxcommands likeget
In this Gitlab issue, https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4797, @denisroy provided a dump of the access logs and I took a look at the use of the API
@amvanbaren Now that we've completed https://github.com/eclipse/openvsx/pull/888, I'm thinking we need a script to scrub and reduce the size of the file resources table. And, for safety's sake, should probably take a backup first. And given that, I'm thinking we need a read only mode, or at least a mode that disables publishing while we work on the table. Thoughts?
Yes, now that https://github.com/eclipse/openvsx/pull/1045 has been completed we can remove resource files from the file_resource table. This can be done through a migration similar to https://github.com/eclipse/openvsx/blob/master/server/src/main/java/org/eclipse/openvsx/migration/SetPreReleaseJobRequestHandler.java
@amvanbaren I see the code, but I'm not sure I understand how the pieces fit together. Are you saying that we run this for each extension? If so, does that remove the need for R/O mode or to pause publication?
@tfroment FYI
@kineticsquid It's an example of a migration job that runs in the background. It is for when extra processing is needed that can't be done in a SQL migration script. We can run a similar migration job for each FileResource that is of type resource where it first deletes the file in storage and then deletes the FileResource entity in the database.
Yes, there's no need for a read-only mode or to pause publication.
@amvanbaren Understood. Do we need to first change how we process published extensions to not create entries for all of these files, or have we already made that improvement?
@kineticsquid In https://github.com/eclipse/openvsx/pull/1100 resources are no longer processed and existing resources are deleted.
@amvanbaren Got it. Do we have any more work to do on this one then after https://github.com/eclipse/openvsx/pull/1100 is deployed?
Hi @amvanbaren any update on this? should we close?
The file_resource table should now be way smaller (migration has completed). The migration_item table can be cleaned up, but it's better to do that through a db migration.
It looks like failed JobRunr jobs are not automatically deleted from the jobrunr_jobs table. There are configuration properties for org.jobrunr.background-job-server.delete-succeeded-jobs-after and org.jobrunr.background-job-server.permanently-delete-deleted-jobs-after, but nothing for failed jobs. @chills-eclipse you can check by running SELECT id, state, createdat, scheduledat FROM jobrunr_jobs ORDER by createdat LIMIT 100;
@amvanbaren do you want the output here in the comments?
@chills-eclipse No, that query was for your convenience.