gh-ost
gh-ost copied to clipboard
Timestamp old table with a later time value
We are using --timestamp-old-table
switch for our migrations. In addition, we have background jobs that drop old tables after a certain period of time. This is done in order to let table "cool down" to avoid InnoDB locks. We are consulting timestamp in table name to decide if it is already safe to finally drop it.
The obvious problem for us is that timestamp is of the migration start. In case of a long (multiple days) migration, this timestamp provides no information of when data from the original table was used last.
There are multiple possible solutions to this:
- Use "safe" grace period before dropping table. Not really suitable for us, though, as we have quite a lot of long-running migrations and some big tables, and we want to cleanup in a couple of days after migration, not weeks or months.
- Use other means to decide when table is safe to be dropped instead of consulting timestamp in the table name. Likely a very sensitive option, but we like current solution based on timestamp too much :grin:, and we apply it in other places, not only in
gh-ost
migrations. - Probably many other ones ...
Another option is to add capability (probably via additional modifying switch next to --timestamp-old-table
) to use a different timestamp, likely one of those. This way old table name would reflect not the migration start moment but a moment closer to the cut-over and the factual "deprecation" of the table.
The obvious downside would be that these timestamps are set rather late in the process, so old table name would not be defined up to that moment. This could, for example, affect hooks that make use of old table name env variable earlier in the process, as this env variable would not be set.
Before continuing with this idea, I wanted to check with you if this capability is worth the effort of further investigation and attempts to raise PR, having in mind its deficiencies and limitations for its usage. It would be surely useful to us, so we are rather motivated to implement it.
Thanks in advance for your input.
We are using
--timestamp-old-table
switch for our migrations. In addition, we have background jobs that drop old tables after a certain period of time. This is done in order to let table "cool down" to avoid InnoDB locks
@vbalys 👋 right, it seems .GetOldTableName()
uses the "start time" instead of "now" time. It might be possible to add a feature to do that but I'm not sure how it would affect users of hooks, as you mentioned
To me the path of least resistance might be the gh-ost
changelog (_ghc
) table. I'm curious if your tool could be modified to SELECT
from the changelog table to find the timestamp you need? I believe all changelog rows are timestamped
As an example, but just BEFORE the atomic rename an AllEventsUpToLockProcessed
event is written to the changelog table. It would be pretty trivial to fire a similar, new changelog event (like CutoverComplete
) in a function like .handleCutOverResult()
. Writing that event is harmless, so it could easily become default behaviour 🤔
Writing that event is harmless, so it could easily become default behaviour 🤔
Correction here, for this to be harmless .onChangelogStateEvent()
would need to be aware of the new event type and no-op on it. If not it could spam the logs
@vbalys I figured a new changelog state is useful in general/regardless, so I've made this PR to add that: https://github.com/github/gh-ost/pull/1134
@timvaillancourt Thanks for the suggestion. I think this could actually work as we keep _ghc
table around after migration, so could in theory query it to get the needed data.
On a second thought, this will not work for us as easily as I thought.
Changelog table is deleted in the final cleanup phase. Our background job that looks for the timestamped old tables will not find the _ghc
one, so it will not be able to get anything from it.
We could use hooks, and then store the needed info somewhere else for the job that will come later. But that complicates a lot, especially if compared to the current one "look for table names by pattern, extract timestamp, check age, drop if old enough".
@vbalys
Would using a gh-ost-on-success
hook to run rename table ${GH_OST_OLD_TABLE_NAME} to ${GH_OST_OLD_TABLE_NAME}_${SOME_TIMESTAMP}
do what you're after?
The onSuccess()
hook is the last thing to run before gh-ost exits after a successful migration (excluding some logging, and if the hook doesn't return an error), so doing a rename here won't affect the migration.
@dm-2 Thanks, sounds good. I will need to check if that works for us, but it could be quite a simple solution.
I'm closing this as it looks like we may have a solution, feel free to re-open if needed :+1: