sql
sql copied to clipboard
[FEATURE] Track query state in plugin with calling EMR-S jobRun API
- Track all the query state in plugin. currently, we only track interactive queries. we should track batch and streaming queries.
- one requirement of this feature is to reduce getJobRun call on EMR-S service.
Queries Journey
Based on job types, we categorize queries into three main types:
- Interactive jobs. e.g. create table
- Batch jobs. e.g. refresh table
- Streaming jobs. e.g. create table with (auto_refresh=true)
Additionally, we have queries that run on the plugin side. e.g. drop index In following table, we explain how the queries is processed in the system
POST
POST async_query create 2 docs in flint_execution_request index
- statement. docId: statementId
type: statement queryId: // asssigned by Transport statementId: // same as queryId state: "waiting"
- session. docId: sessionId
- based on the query, the sessionType will be different.
type: session sessionType: interactive / batch / streaming / local state: "NOT_STARTED"
FlintREPLJob
- get sessionId from spark.flint.job.sessionId conf.
- get sessionType of sessionId.
- take action based on sessionType
- interactive
- while (true)
- read statement with sessionId and process.
- write result to resultIndex
- update statementState = success
- quit if process done
- while (true)
- batch
- read statement with sessionId and process.
- write result to resultIndex
- update statementState = success
- quit if process done
- streaming
- read statement with sessionId and process.
- write result to resultIndex
- update statementState = success
- interactive
- Update session lastUpdateTime regularly
- Update sessionState to DEAD before quit
GET
GET async_query fetch state from statement doc. If the state is success, fetch result from query_result_indices.
CANCEL
CANCEL async_query.
- opt-1
- Plugin (1) set statement state to cancelling state.
- FlintJob (1) read statement state, if it cancelling. mark as cancelled state. force quite the job.
- opt-2
- Plugin (1) set statement state to cancelling. (2) cancel the job associate with session. (3) set statement to cancelled state.
Limitation: DROP INDEX query can not be cancelled.