duplicati icon indicating copy to clipboard operation
duplicati copied to clipboard

Add indexes to improve performance

Open AxeOfMen opened this issue 4 years ago • 8 comments

Add performance index

AxeOfMen avatar Apr 09 '22 00:04 AxeOfMen

Hey, can you validate that adding : CREATE INDEX "FilesetEntryFilesetID" ON "FilesetEntry" ("FilesetID"); Improves the performance over the indexes already in Performance Indexes.sql.

This index: CREATE INDEX "nn_FilesetentryFile" on FilesetEntry ("FilesetID","FileID"); should meet the same responsibility as your FilesetEntryFilesetID index.

Also any new database changes need to be in their own subsequent .sql file. So you should add your changes to a 13. Additional index.sql or something as any existing databases created with the current canary would need to have another version so they know to apply the new indexes. I think as long as that .sql file hasn't been added to a release it is ok to include it in the same file but that isn't the case for this.

jedthe3rd avatar Apr 09 '22 04:04 jedthe3rd

Thanks for the feedback. I have moved the indexes to a separate file and incremented the db version to 13.

Yes, as mentioned in the comments for Issue 4645, these new queries are necessary. I figured your nn_FilesetentryFile index would cover the same ground as my FilesetEntryFilesetID index, but for whatever reason, mine is necessary as well. I'm curious whether you see this using the query I posted in 4645 against your 10G db.

AxeOfMen avatar Apr 09 '22 06:04 AxeOfMen

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

AxeOfMen avatar Apr 09 '22 15:04 AxeOfMen

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

This is what I was thinking was the case. Ya, I definitely think we should run analyze before each backup. I actually made this change in my code but haven't gotten around to creating the pull request because I am working on a few changes I wanted to batch.

Just change this request to add analyze to run before the backup runs and that should be good. This should have a minimal time impact overall. Can you also see if you can update the backup status to show it's "optimizing the database" as a step. I think that would be a good addition to this change. If it seems like a bunch of work then we can leave that for a different time.

jedthe3rd avatar Apr 09 '22 15:04 jedthe3rd

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

AxeOfMen avatar Apr 09 '22 17:04 AxeOfMen

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

Ya, I didn't think that thought all the way through. Actually the best bet is probably to add an analyze after a database is upgraded here: https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/SQLiteHelper/DatabaseUpgrader.cs Then once we release the next version of changes an analyze is a part of the process of the upgrade. I have a number of SQL changes I need to finish testing so we can have the next db version here fairly soon.

jedthe3rd avatar Apr 10 '22 19:04 jedthe3rd

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

kenkendk avatar Jun 12 '22 20:06 kenkendk

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

These indexes are basically duplicates. The real solution to the issue he is facing is to add an analyze when the database is upgraded.

jedthe3rd avatar Jun 17 '22 04:06 jedthe3rd

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/first-backup-has-been-running-for-over-a-day-since-upgrade-from-2-0-6-1-to-2-0-7-1/16382/6

duplicatibot avatar Jun 27 '23 23:06 duplicatibot