superset
superset copied to clipboard
perf(dashboard): Virtualization POC
SUMMARY
This PR is a proof of concept for dashboard virtualization. Due to a performance issues on large dashboards (60+ charts in a single tab/view), this PR introduces virtualization in an attempt to reduce the number of elements (DOM nodes) rendered at once. Only the charts visible in the current viewport (with 100% margin top and bottom) are rendered - the rest of them are removed from DOM tree and rendered only when user scrolls them into view. Rerendering charts do NOT trigger new queries - we reuse data fetched on dashboard mount and only redraw the charts.
As of now, this feature is in proof-of-concept faze and should be considered as experimental. It's hidden behind a feature flag DASHBOARD_VIRTUALIZATION
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
https://user-images.githubusercontent.com/15073128/189683884-96f5d02f-4b2d-4eaa-8f05-5bb6b6de899e.mov
TESTING INSTRUCTIONS
- Enable FF
DASHBOARD_VIRTUALIZATION
- Open a dashboard with a lot of charts
- Scroll up and down - verify that only the charts in current viewport (or close to it - 100% margin) are rendered. The rest of them displays a spinner and loads only when user scroll them into view
- Verify that charts render when user scrolls, but do not send new queries to the backend
ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [x] Required feature flags:
DASHBOARD_VIRTUALIZATION
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in SIP-59)
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
Codecov Report
Merging #21438 (d169d72) into master (f58227a) will decrease coverage by
0.01%
. The diff coverage is22.72%
.
:exclamation: Current head d169d72 differs from pull request most recent head 67052ff. Consider uploading reports for the commit 67052ff to get more accurate results
@@ Coverage Diff @@
## master #21438 +/- ##
==========================================
- Coverage 66.84% 66.83% -0.02%
==========================================
Files 1800 1802 +2
Lines 68951 68972 +21
Branches 7335 7342 +7
==========================================
+ Hits 46092 46096 +4
- Misses 20965 20980 +15
- Partials 1894 1896 +2
Flag | Coverage Δ | |
---|---|---|
hive | 52.92% <ø> (ø) |
|
javascript | 53.14% <22.72%> (-0.03%) |
:arrow_down: |
mysql | 78.24% <ø> (ø) |
|
postgres | 78.31% <ø> (ø) |
|
presto | 52.82% <ø> (ø) |
|
python | 81.45% <ø> (ø) |
|
sqlite | 76.79% <ø> (ø) |
|
unit | 51.04% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
...ackages/superset-ui-core/src/utils/featureFlags.ts | 100.00% <ø> (ø) |
|
.../src/dashboard/components/gridComponents/Chart.jsx | 55.04% <ø> (ø) |
|
...ashboard/components/gridComponents/ChartHolder.tsx | 88.88% <ø> (ø) |
|
...tend/src/utils/isDashboardVirtualizationEnabled.ts | 0.00% <0.00%> (ø) |
|
superset/config.py | 91.63% <ø> (ø) |
|
superset/views/base.py | 75.70% <ø> (ø) |
|
...nd/src/dashboard/components/gridComponents/Row.jsx | 51.21% <17.64%> (-23.79%) |
:arrow_down: |
superset-frontend/src/components/Chart/Chart.jsx | 51.66% <50.00%> (-0.88%) |
:arrow_down: |
superset-frontend/src/utils/isBot.ts | 100.00% <100.00%> (ø) |
|
...nts/controls/DateFilterControl/DateFilterLabel.tsx | 41.41% <0.00%> (ø) |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
/testenv up FEATURE_DASHBOARD_VIRTUALIZATION
@kgabryje Ephemeral environment spinning up at http://34.216.203.189:8080. Credentials are admin
/admin
. Please allow several minutes for bootstrapping and startup.
Love this feature. One suggestion: from my experience, it would be better if it didn't re-render when scrolling back.
We currently have react-virtualized
and react-window
in our dependencies that provide virtualization. react-window
is the lighter-weight alternative. Can we use one of them for this problem?
Thank you for the feedback! @hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.
@michael-s-molina I considered using react-window
, but I don't think it's applicable in this case. react-window
expects a clear list of elements to render, with their heights and widths. The logic of rendering the dashboard is more complex - here we have DashboardComponent
s recursively rendering other DashboardComponent
s.
Moreover, even if we managed to restructure dashboard rendering logic to use react-window
, that would mean that we not only delay drawing the chart, but also running chart queries - as it's the src/components/Chart.jsx
component that handles running queries and we'd not render it until it's in view.
I think in this case the "manual" implementation of virtualization with IntersectionObserver
is a better choice
@hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.
One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:
DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED NONE = No virtualization is applied VIEWPORT = Only elements in the viewport are kept in the DOM PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed
One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:
DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED NONE = No virtualization is applied VIEWPORT = Only elements in the viewport are kept in the DOM PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed
Do we have an established pattern for 3-state feature flags?
Do we have an established pattern for 3-state feature flags?
I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.
Loving this feature, btw. I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.
I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.
If you're getting perf issues with virtualization, but not without it, that might mean that the intersection observer is too expensive to be put on every chart. If that's the case, we might want to put it on Row
component instead and propagate the visibility prop down to charts...
Do we have an established pattern for 3-state feature flags?
I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.
I refactored it like this: DASHBOARD_VIRTUALIZATION: Literal["NONE", "VIEWPORT", "PAGINATED"] = "NONE"
. It's available in redux state as common.conf.DASHBOARD_VIRTUALIZATION
. We use a similar pattern with another config option: DASHBOARD_AUTO_REFRESH_MODE: Literal["fetch", "force"] = "force"
. However, I'm also worried about potential problems with feature flag managers...
I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.
If that is a concern, a possible approach would be to treat it like a configuration instead of a feature flag, similar to CACHE_TYPE
which defaults to null
but can be overridden with different cache backends. This is not a blocker for me, I'm just trying to avoid two feature flags for the same feature.
@rusackas @michael-s-molina Updated the PR:
- Make
DASHBOARD_VIRTUALIZATION
an enum config instead of a feature flag -
VIEWPORT
mode - remove charts that are not in view;PAGINATED
- persist the already rendered charts;NONE
- virtualization disabled - Move
IntersectionObserver
toRow
. Now we'll have much fewer observers - 1 per row instead of 1 per chart
Unfortunately I can't start a testenv with virtualization since now it's a config instead of a feature flag, so if you want to test manually, you need to pull my branch to your local setup
Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido
Awesome feature, tested locally.
@kgabryje @rusackas Thinking more carefully about the problem, I think we could dynamically choose the strategy (viewport or paginated) based on the number of charts that needs to be rendered. This way we would have the best of both strategies and avoid adding another feature flag or configuration.
Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido
Good catch @stephenLYZ. I think we can use the same approach we would use for the tabs. We would be catching series as they appear and either apply an existing color if that was seen before or assign a new color from the scheme and persist it.
Love it! Just one preference from me, I think the options VIEWPORT and PAGINATED are not super clear at first glance. If we set the charts to load on scroll I think LAZY might be a more appropriate keyword.
The problem is that both are lazy 😜. Anyway, I vote for removing the flag/configuration completely and choosing the strategy based on the number of charts that need to be rendered.
@kgabryje Is this safe to support the screen capture from "Reports" worker?
I personally recommend using react-intersection-observer
which can cover the fallback case of unsupported IntersectionObserver
(which possibly issue on screen capture from the celery worker)
btw I recently introduced the similar feature (which only limits the runQuery) in #21488
Thank you all for feedback and suggestions! @justinpark I'm not sure if delaying queries is a better solution. Queries are initialized in the order in which the charts are placed on the dashboard, so while we usually are exceeding the browser's simultaneous query limit, charts at the top of the page don't really need to wait for the charts at the bottom. Moreover, I've chosen to only delay drawing charts instead of drawing + running queries in order to reduce a possible "slow scroll" experience.
Great catch about the reports! Do you have any idea how to disable virtualization for celery workers?
@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards. WDYT?
@villebro Good point about the Pascal case! However, if we can figure out the dynamic approach proposed by Michael we won't need that config, so I'm not changing that for now 🙂
@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards. WDYT?
Sounds good. We can determine the arbitrary number by displaying a heavy viewport of charts, measuring the consumed memory, and multiplying it by what we consider acceptable.
IIRC, react-virtualized
has been discontinued by the author in favor of continued support for react-window
. So that's another reason to not use react-virtualized
.
@michael-s-molina @justinpark @geido @villebro @hushaoqing @rusackas @ktmud Thank you all for feedback and suggestions... and your patience. I updated the PR - now the charts render when they are 1vh away from the viewport, but they're removed from DOM only if they are more than 4vh away from viewport. See the updated summary for more details
/testenv up FEATURE_DASHBOARD_VIRTUALIZATION
@kgabryje Ephemeral environment spinning up at http://52.42.75.81:8080. Credentials are admin
/admin
. Please allow several minutes for bootstrapping and startup.
Ephemeral environment shutdown and build artifacts deleted.