Invoke-DbaDbDataMasking - Improve performance
Switch from column-based masking to row-based masking.
Type of Change
- [x] Bug fix (non-breaking change, fixes #7940 )
- [ ] New feature (non-breaking change, adds functionality, fixes # )
- [ ] Breaking change (affects multiple commands or functionality, fixes # )
- [x] Ran manual Pester test and has passed (
Invoke-ManualPester) - [ ] Adding code coverage to existing functionality
- [ ] Pester test is included
- [ ] If new file reference added for test, has is been added to github.com/dataplat/appveyor-lab ?
- [ ] Unit test is included
- [ ] Documentation
- [ ] Build system
Purpose
To improve the speed at which Invoke-DbaDbDataMasking masks data in large tables
Approach
This change switches from Column by column masking to row by row masking, allowing for a batch of rows to be done at a single time, instead of having to iterate over each column to be masked in the table for each row.
This is a big change and not easy to review. Best would be to let @sanderstad have a look at this and approve it. Sander, do you have the time?
As #9664 is merged, you should update your branch to resolve the conflict. Then we can continue to review this PR.
Sorry, I didn't see that you tagged me until now. I could've done the check, but it's already merged
@sanderstad this perf change isn't merged in - was another fix. Let us know what you think 😄
I looked at the code and I know there was a reason why I did it column by column for a single table. I was thinking about this approach from a column perspective, not a row perspective. The reason I went for this approach is because I wanted the entire column to be updated for a table at once, and not per row (RBAR).
@AdamJKoehler Your approach could work, but I'm very much interested in your results from testing. Do you have an example of the original way of doing it and the new way?
The code looks good for the rest of it, but I would like to see the results from a run with that amount of records (200.000 +).
@sanderstad, I do have some old ones. I'll get a fresh set in the next few days and send it over.
How're we looking, all?
@potatoqualitee I got pulled into some emergency issues that took me away from this. I should be able to get back to this within the next week.
Thank you @AdamJKoehler 👍🏼
Hello, closing for now but I encourage you to reopen once it's been tested, I'll be happy to merge 👍🏼
@sanderstad, @potatoqualitee Here's my recent test. My changes are the top result, and the bottom are the original implementation.
I was going to re-open the PR, but it appears I don't have the ability to do so.
Awesome, thank you @AdamJKoehler ! Reopened and will merge once tests pass 👍🏼
boom, thank you! Merrrrrrrrrged!