[SIP-93] Proposal for non-blocking SQL Lab persistence
[SIP-93] Proposal for async SQL Lab persistence
Motivation
At Airbnb we have a performance (lag) issue using SQLLAB_BACKEND_PERSISTENCE feature.
Since SQLLAB_BACKEND_PERSISTENCE designed in a blocking way (as shown in the following), user can be stuck while tab_state table in busy state. (even unable to run a query while the tabstateview endpoint timed out)
https://github.com/apache/superset/blob/8c2719b11cebe451c5ba68193cf8dd51e4dce8e3/superset-frontend/src/SqlLab/actions/sqlLab.js#L614-L625
In order to improve this lag issue, we'd like to have a non-blocking mechanism for this db syncing workflow. i.e. like g-doc auto saving. moreover we'd like to design a hybrid solution (i.e. keep using redux-localstorage and save same snaphot in remote) to keep the advantage (offline saving) of using local storage.
Proposed Change
The proposed solution is simple. We would like to keep using redux-localstorage as well as the current localStorage mechanism (as SQLLAB_BACKEND_PERSISTENCE turned OFF). Like QueryAutoSync does, we would like to build a component that observes the changes of queryEditor and then periodically sync the updated queryEditor snapshot from redux state using existing /tabstateview/ api.
For the payload wise, we'd like to use extra_json in tab_state table to store the snapshot(by json format) of queryEditor as stored in localStorage (rather than mapping each value in a proper table column like sql, label, query_limit).
As existing persistence feature does, we will design similar hydration flow in frontend to initialize the sqlLab state from the backend. With this json blob it will simply convert to json and then merge with the existing local state.
New or Changed Public Interfaces
We will introduce new SQLLAB_BACKEND_PERSISTENCE_ASYNC FLAG and need to pass the tab_state_ids value including extra_json in this case.
New dependencies
I do not for-see any new dependencies.
Migration Plan and Compatibility
I do not for-see any migration required.
Rejected Alternatives
None.
to: @etr2460 @ktmud @john-bodley cc: @betodealmeida
I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote? I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.
For states that can be loaded fresh upon page load--such as tables and databases, they should also be skipped for localStorage in this async approach.
There are also some duplicates in saved states, too, that we may want to clean up:
I also noticed we are saving previous queries of current active tab even though they may never be used again. Should be able to save a lot of space if we just clean them up.
To summarize, I think it'd be useful to start cleaning up current saved states when SQLLAB_BACKEND_PERSISTENCE is off, before migrating to this hybrid approach.
I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote?
Good point. There's a helper function for this and planning to reuse this function to trim the unnecessary items. And it will only push the draft (editor) states (the gap before saved in server) to the local storage. Table (schema) metadata are exclusively stored on remote.
https://github.com/apache/superset/blob/5f9f657805a5559de6f06c647a33f278aadc32db/superset-frontend/src/SqlLab/utils/reduxStateToLocalStorageHelper.js#L56-L60
I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.
agree. I want to remove the current local storage copy of query results (The query results will retain while tab is opened)
but only stores the resultsKey that requests the cached result from remote.
To summarize, I think it'd be useful to start cleaning up current saved states
yes. It will trim the unnecessary saved states during this job.
Interesting idea @justinpark . I wonder if we should provide both sync and async persistence options or just provide the async version as the evolution of the feature. If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.
Nice proposal @justinpark
Took a look today, this is comprehensive, seems reasonable and I don't think there's anything controversial here.
+1
Though I'd love to get @betodealmeida 's blessing since I believe he implemented the original SQLLAB_BACKEND_PERSISTENCE
If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.
@michael-s-molina I agree with the idea if no concerns found with this movement. I can make the improvement under same feature flag (SQLLAB_BACKEND_PERSISTENCE)
Closing the SIP as it was approved.