[BUG] DROP INDEX cancel FlintREPL job
What is the bug? User submit DROP INDEX query, plugin cancel FlintREPL job. The imapct is if user submit query in same session, submitted query will in waiting state forever.
How can one reproduce the bug?
- T1: submit CREATE SKIPPING INDEX query with auto_refresh=false
- T2: submit DROP INDEX query, make sure (T2-T1) < 3mins
unexpected result is
- FlintREPL EMR-S job is cancelled, indexState is REFRESHING
What is the expected behavior? FlintREPL EMR-S job should not been cancelled.
Do you have any additional context? In SQL Plugin 2.13, when user submit DROP INDEX query, plugin cancel EMR-S job even index state is EMTPY / ACTIVE.
public boolean validate(FlintIndexState state) {
return state == FlintIndexState.REFRESHING
|| state == FlintIndexState.EMPTY
|| state == FlintIndexState.ACTIVE
|| state == FlintIndexState.CREATING;
}
Suggested solutions Alt-1. For CREATE INDEX, we could submit FlintJob instead of FlintREPL. Alt-2. Prevalidate indexState, only cancel Job when it is in Refreshing state.
I think it’s intended that DROP want to delete indexes of all those states. What’s not intended is that it was assuming the job to be either streaming job or batch refresh job. DROP does the following:
- validate initial state is refreshing/empty/active/creating
- transition to deleting
- cancel streaming job
- transition to deleted
1 is correct, since we need to be able to drop those indexes. The incorrect part is 3, where the jobId in FlintIndexStateModel is not for streaming job, but for interactive job, and we still cancel it.
Before refactoring the DML handler and FlintIndexOp, DROP index involved 2 steps: Cancel op and Delete op, and each has their own validation. (cancel, delete) After refactoring, the two steps are merged and shared one same validation logic, which is incorrect.
Yet another alternative Alt-3 is to have IndexDMLHandler first try cancel, and then drop
Opting for Alt-1.