superset icon indicating copy to clipboard operation
superset copied to clipboard

[SIP-93] Proposal for non-blocking SQL Lab persistence

Open justinpark opened this issue 3 years ago • 5 comments

[SIP-93] Proposal for async SQL Lab persistence

Motivation

At Airbnb we have a performance (lag) issue using SQLLAB_BACKEND_PERSISTENCE feature. Since SQLLAB_BACKEND_PERSISTENCE designed in a blocking way (as shown in the following), user can be stuck while tab_state table in busy state. (even unable to run a query while the tabstateview endpoint timed out)

https://github.com/apache/superset/blob/8c2719b11cebe451c5ba68193cf8dd51e4dce8e3/superset-frontend/src/SqlLab/actions/sqlLab.js#L614-L625

In order to improve this lag issue, we'd like to have a non-blocking mechanism for this db syncing workflow. i.e. like g-doc auto saving. moreover we'd like to design a hybrid solution (i.e. keep using redux-localstorage and save same snaphot in remote) to keep the advantage (offline saving) of using local storage.

Proposed Change

The proposed solution is simple. We would like to keep using redux-localstorage as well as the current localStorage mechanism (as SQLLAB_BACKEND_PERSISTENCE turned OFF). Like QueryAutoSync does, we would like to build a component that observes the changes of queryEditor and then periodically sync the updated queryEditor snapshot from redux state using existing /tabstateview/ api. For the payload wise, we'd like to use extra_json in tab_state table to store the snapshot(by json format) of queryEditor as stored in localStorage (rather than mapping each value in a proper table column like sql, label, query_limit). As existing persistence feature does, we will design similar hydration flow in frontend to initialize the sqlLab state from the backend. With this json blob it will simply convert to json and then merge with the existing local state.

New or Changed Public Interfaces

We will introduce new SQLLAB_BACKEND_PERSISTENCE_ASYNC FLAG and need to pass the tab_state_ids value including extra_json in this case.

New dependencies

I do not for-see any new dependencies.

Migration Plan and Compatibility

I do not for-see any migration required.

Rejected Alternatives

None.

to: @etr2460 @ktmud @john-bodley cc: @betodealmeida

justinpark avatar Sep 08 '22 18:09 justinpark

I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote? I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.

image

For states that can be loaded fresh upon page load--such as tables and databases, they should also be skipped for localStorage in this async approach.

image

There are also some duplicates in saved states, too, that we may want to clean up:

image

I also noticed we are saving previous queries of current active tab even though they may never be used again. Should be able to save a lot of space if we just clean them up.


To summarize, I think it'd be useful to start cleaning up current saved states when SQLLAB_BACKEND_PERSISTENCE is off, before migrating to this hybrid approach.

ktmud avatar Sep 08 '22 19:09 ktmud

I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote?

Good point. There's a helper function for this and planning to reuse this function to trim the unnecessary items. And it will only push the draft (editor) states (the gap before saved in server) to the local storage. Table (schema) metadata are exclusively stored on remote.

https://github.com/apache/superset/blob/5f9f657805a5559de6f06c647a33f278aadc32db/superset-frontend/src/SqlLab/utils/reduxStateToLocalStorageHelper.js#L56-L60

I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.

agree. I want to remove the current local storage copy of query results (The query results will retain while tab is opened) but only stores the resultsKey that requests the cached result from remote.

To summarize, I think it'd be useful to start cleaning up current saved states

yes. It will trim the unnecessary saved states during this job.

justinpark avatar Sep 08 '22 19:09 justinpark

Interesting idea @justinpark . I wonder if we should provide both sync and async persistence options or just provide the async version as the evolution of the feature. If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.

michael-s-molina avatar Nov 17 '22 02:11 michael-s-molina

Nice proposal @justinpark

geido avatar Nov 22 '22 14:11 geido

Took a look today, this is comprehensive, seems reasonable and I don't think there's anything controversial here.

+1

Though I'd love to get @betodealmeida 's blessing since I believe he implemented the original SQLLAB_BACKEND_PERSISTENCE

mistercrunch avatar Nov 23 '22 18:11 mistercrunch

If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.

@michael-s-molina I agree with the idea if no concerns found with this movement. I can make the improvement under same feature flag (SQLLAB_BACKEND_PERSISTENCE)

justinpark avatar Nov 30 '22 21:11 justinpark

Closing the SIP as it was approved.

michael-s-molina avatar Nov 13 '23 17:11 michael-s-molina