gh-ost icon indicating copy to clipboard operation
gh-ost copied to clipboard

Proposal for New Ghost Feature Integration

Open BarShauli555 opened this issue 9 months ago • 1 comments

Hi Team,

How are you?

I'm Bar from the Wix DBA team. A year ago, I added a new ghost feature to our private GitHub repository, and we would like to contribute it to this entire project.

Our main goal is to safely delete data and rebuild the table using ghost. We can implement this in two ways:

  1. When changing the DDL of this table
  2. When we want to delete old data using the ALTER command: ENGINE=InnoDB

I have added a new PR for this feature, and the code is currently running in our production environment and testing.

The PR includes: A new WHERE clause statement, defaulting to 1=1. Here’s an example of filter operations:

Comparison operators: “<, >, =, !=”

  1. data_created > year('2020')
  2. id >= 100
  3. status IN ('done') # Please use a maximum of 5 values for IN; otherwise, use a temp table.

Sub-select from a temp table:

  1. id IN (SELECT id FROM temp_id) - Preferred for primary keys; otherwise, it’s a heavy process.
  2. varchar_column IN (SELECT varchar_column FROM varchar_temp_table)

Adding the receiving condition to the SQL builder mechanism that filters in the data according to our additional statement in the WHERE clause.

Adding tests.

Please note: This deletion only occurs from the chunk size; if any DML is inserted during the ghost operation and is written to the binlog, it will stop the entire run.

Thank you for your attention. I look forward to your feedback!

Best regards, Bar

Let me know what you are thinking :-)

BarShauli555 avatar Mar 18 '25 08:03 BarShauli555

Please note: This deletion only occurs from the chunk size; if any DML is inserted during the ghost operation and is written to the binlog, it will stop the entire run.

@BarShauli555 why did you decide to stop the entire run when DML in happening towards the table? Doesn't that defeat the purpose of gh-ost?

I would rather go with one of the following two options:

  • only filter the chunks and just let the DML operations to non-existing entries run
  • filter the DML from the binlog applier as well

The first option seems to be the less complex solution and will likely result in less "cleanup" needed afterwards.

I couldn't find in your implementation where you stop when DML is happening, can you link to the change that does that please?

derN3rd avatar Aug 29 '25 07:08 derN3rd