vitess icon indicating copy to clipboard operation
vitess copied to clipboard

Do Not Merge: VReplication Copy Phase: Parallelize bulk insert to check for performance improvements

Open rohit-nayak-ps opened this issue 2 years ago • 2 comments

POC

Batches multiple bulk inserts in the copy phase to test if concurrent bulk inserts improve copy phase performance.

Description

During the copy phase we stream rows and for every PacketSize of rows (default 250K bytes) we do a bulk insert. This PR batches multiple of these sets.

We add a VReplicationTableCopyTimings stat to monitor when a table copy starts and when it ends for benchmarking purposes. This metric is per vreplication stream.

This feature is behind an experimental vttablet flag. To enable it use -vreplication_experimental_flags 2 on the target.

You can specify the number of bulk inserts using -vreplication_parallel_bulk_inserts 16 Default is 4

Approach

Instead of inserting one batch at a time we collect N batches and insert them in parallel. Commits are done in order.

Checklist

  • [ ] Should this PR be backported?
  • [ ] Tests were added or are not required
  • [ ] Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • [ ] Query Serving
  • [X] VReplication
  • [ ] Cluster Management
  • [ ] Build/CI
  • [ ] VTAdmin

rohit-nayak-ps avatar Jul 25 '22 20:07 rohit-nayak-ps

Review Checklist

Hello reviewers! :wave: Please follow this checklist when reviewing this Pull Request.

General

  • [x] Ensure that the Pull Request has a descriptive title.
  • [x] If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • [x] If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive. Additionally, flag names should use dashes (-) as word separators rather than underscores (_).
  • [x] If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • [x] There should be at least one unit or end-to-end test.
  • [x] The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • [x] There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • [x] Should be documented, either by modifying the existing documentation or creating new documentation.
  • [x] New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • [x] Protobuf changes should be wire-compatible.
  • [x] Changes to _vt tables and RPCs need to be backward compatible.
  • [x] vtctl command output order should be stable and awk-able.

vitess-bot[bot] avatar Jul 25 '22 20:07 vitess-bot[bot]

With @mattlord help I have been benchmarking the performance of this PR alongside #10788. Both PRs improve performance between 10-200%, depending on factors like CPU load on the data source.

Because this PR came first, already has undergone some code review, and is simpler than #10788, I'm going to add more work to this PR.

  • [x] Completely avoid async path when experimental flag not present
  • [x] Get E2E tests to pass without experimental flag
  • [x] Get unit tests to pass without experimental flag
  • [x] Get E2E tests to pass with experimental flag
  • [x] Get unit tests to pass with experimental flag
  • [x] Add more unit tests with experimental flag
  • [x] Add more E2E tests with experimental flag

maxenglander avatar Aug 02 '22 17:08 maxenglander

Have squashed all the commits. Was too annoying to try to fix the commit that was missing DCO with rebase. Previous commit history preserved over here just in case.

maxenglander avatar Oct 20 '22 20:10 maxenglander