superset icon indicating copy to clipboard operation
superset copied to clipboard

perf(dashboard): Virtualization POC

Open kgabryje opened this issue 2 years ago • 23 comments

SUMMARY

This PR is a proof of concept for dashboard virtualization. Due to a performance issues on large dashboards (60+ charts in a single tab/view), this PR introduces virtualization in an attempt to reduce the number of elements (DOM nodes) rendered at once. Only the charts visible in the current viewport (with 100% margin top and bottom) are rendered - the rest of them are removed from DOM tree and rendered only when user scrolls them into view. Rerendering charts do NOT trigger new queries - we reuse data fetched on dashboard mount and only redraw the charts.

As of now, this feature is in proof-of-concept faze and should be considered as experimental. It's hidden behind a feature flag DASHBOARD_VIRTUALIZATION

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

https://user-images.githubusercontent.com/15073128/189683884-96f5d02f-4b2d-4eaa-8f05-5bb6b6de899e.mov

TESTING INSTRUCTIONS

  1. Enable FF DASHBOARD_VIRTUALIZATION
  2. Open a dashboard with a lot of charts
  3. Scroll up and down - verify that only the charts in current viewport (or close to it - 100% margin) are rendered. The rest of them displays a spinner and loads only when user scroll them into view
  4. Verify that charts render when user scrolls, but do not send new queries to the backend

ADDITIONAL INFORMATION

  • [ ] Has associated issue:
  • [x] Required feature flags: DASHBOARD_VIRTUALIZATION
  • [ ] Changes UI
  • [ ] Includes DB Migration (follow approval process in SIP-59)
    • [ ] Migration is atomic, supports rollback & is backwards-compatible
    • [ ] Confirm DB migration upgrade and downgrade tested
    • [ ] Runtime estimates and downtime expectations provided
  • [ ] Introduces new feature or API
  • [ ] Removes existing feature or API

kgabryje avatar Sep 12 '22 14:09 kgabryje

Codecov Report

Merging #21438 (d169d72) into master (f58227a) will decrease coverage by 0.01%. The diff coverage is 22.72%.

:exclamation: Current head d169d72 differs from pull request most recent head 67052ff. Consider uploading reports for the commit 67052ff to get more accurate results

@@            Coverage Diff             @@
##           master   #21438      +/-   ##
==========================================
- Coverage   66.84%   66.83%   -0.02%     
==========================================
  Files        1800     1802       +2     
  Lines       68951    68972      +21     
  Branches     7335     7342       +7     
==========================================
+ Hits        46092    46096       +4     
- Misses      20965    20980      +15     
- Partials     1894     1896       +2     
Flag Coverage Δ
hive 52.92% <ø> (ø)
javascript 53.14% <22.72%> (-0.03%) :arrow_down:
mysql 78.24% <ø> (ø)
postgres 78.31% <ø> (ø)
presto 52.82% <ø> (ø)
python 81.45% <ø> (ø)
sqlite 76.79% <ø> (ø)
unit 51.04% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
.../src/dashboard/components/gridComponents/Chart.jsx 55.04% <ø> (ø)
...ashboard/components/gridComponents/ChartHolder.tsx 88.88% <ø> (ø)
...tend/src/utils/isDashboardVirtualizationEnabled.ts 0.00% <0.00%> (ø)
superset/config.py 91.63% <ø> (ø)
superset/views/base.py 75.70% <ø> (ø)
...nd/src/dashboard/components/gridComponents/Row.jsx 51.21% <17.64%> (-23.79%) :arrow_down:
superset-frontend/src/components/Chart/Chart.jsx 51.66% <50.00%> (-0.88%) :arrow_down:
superset-frontend/src/utils/isBot.ts 100.00% <100.00%> (ø)
...nts/controls/DateFilterControl/DateFilterLabel.tsx 41.41% <0.00%> (ø)

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Sep 12 '22 14:09 codecov[bot]

/testenv up FEATURE_DASHBOARD_VIRTUALIZATION

kgabryje avatar Sep 12 '22 15:09 kgabryje

@kgabryje Ephemeral environment spinning up at http://34.216.203.189:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

github-actions[bot] avatar Sep 12 '22 16:09 github-actions[bot]

Love this feature. One suggestion: from my experience, it would be better if it didn't re-render when scrolling back.

hushaoqing avatar Sep 13 '22 07:09 hushaoqing

We currently have react-virtualized and react-window in our dependencies that provide virtualization. react-window is the lighter-weight alternative. Can we use one of them for this problem?

michael-s-molina avatar Sep 13 '22 11:09 michael-s-molina

Thank you for the feedback! @hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.

@michael-s-molina I considered using react-window, but I don't think it's applicable in this case. react-window expects a clear list of elements to render, with their heights and widths. The logic of rendering the dashboard is more complex - here we have DashboardComponents recursively rendering other DashboardComponents. Moreover, even if we managed to restructure dashboard rendering logic to use react-window, that would mean that we not only delay drawing the chart, but also running chart queries - as it's the src/components/Chart.jsx component that handles running queries and we'd not render it until it's in view. I think in this case the "manual" implementation of virtualization with IntersectionObserver is a better choice

kgabryje avatar Sep 13 '22 13:09 kgabryje

@hushaoqing @villebro Removing charts after scrolling them out of view was a conscious decision in order to reduce the number of nodes in DOM tree. However, I do agree that seeing the loading spinner all the time might be annoying, especially for the charts that take longer to render. It's very easy to change that behaviour, I'll do it later today. If we get some opposite feedback (charts should be removed), I can introduce another feature flag to make it configurable.

One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:

DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED NONE = No virtualization is applied VIEWPORT = Only elements in the viewport are kept in the DOM PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed

michael-s-molina avatar Sep 13 '22 13:09 michael-s-molina

One reason to render only visible items is memory footprint. Many virtualization libraries use that approach. That being said, may I suggest using pre-defined values for the feature flag instead of creating a new one? Like this:

DASHBOARD_VIRTUALIZATION = NONE | VIEWPORT | PAGINATED NONE = No virtualization is applied VIEWPORT = Only elements in the viewport are kept in the DOM PAGINATED = Elements are added to the DOM when they enter the viewport and they are not removed

Do we have an established pattern for 3-state feature flags?

kgabryje avatar Sep 13 '22 13:09 kgabryje

Do we have an established pattern for 3-state feature flags?

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

Loving this feature, btw. I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.

rusackas avatar Sep 13 '22 14:09 rusackas

I'm getting weird performance issues (i.e. the browser freezing) but i'll try it again in a bit and see if it's just a fluke.

If you're getting perf issues with virtualization, but not without it, that might mean that the intersection observer is too expensive to be put on every chart. If that's the case, we might want to put it on Row component instead and propagate the visibility prop down to charts...

kgabryje avatar Sep 13 '22 14:09 kgabryje

Do we have an established pattern for 3-state feature flags?

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

I refactored it like this: DASHBOARD_VIRTUALIZATION: Literal["NONE", "VIEWPORT", "PAGINATED"] = "NONE". It's available in redux state as common.conf.DASHBOARD_VIRTUALIZATION. We use a similar pattern with another config option: DASHBOARD_AUTO_REFRESH_MODE: Literal["fetch", "force"] = "force". However, I'm also worried about potential problems with feature flag managers...

kgabryje avatar Sep 13 '22 14:09 kgabryje

I'm not aware of one, and it might be worth mentioning that a lot of companies use feature flag manager platforms that prefer to work (or only work) with booleans. Booleans feel safer.

If that is a concern, a possible approach would be to treat it like a configuration instead of a feature flag, similar to CACHE_TYPE which defaults to null but can be overridden with different cache backends. This is not a blocker for me, I'm just trying to avoid two feature flags for the same feature.

michael-s-molina avatar Sep 13 '22 15:09 michael-s-molina

@rusackas @michael-s-molina Updated the PR:

  1. Make DASHBOARD_VIRTUALIZATION an enum config instead of a feature flag
  2. VIEWPORT mode - remove charts that are not in view; PAGINATED - persist the already rendered charts; NONE - virtualization disabled
  3. Move IntersectionObserver to Row. Now we'll have much fewer observers - 1 per row instead of 1 per chart

kgabryje avatar Sep 13 '22 15:09 kgabryje

Unfortunately I can't start a testenv with virtualization since now it's a config instead of a feature flag, so if you want to test manually, you need to pull my branch to your local setup

kgabryje avatar Sep 13 '22 15:09 kgabryje

Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido

stephenLYZ avatar Sep 14 '22 04:09 stephenLYZ

Awesome feature, tested locally.

mayurnewase avatar Sep 14 '22 06:09 mayurnewase

@kgabryje @rusackas Thinking more carefully about the problem, I think we could dynamically choose the strategy (viewport or paginated) based on the number of charts that needs to be rendered. This way we would have the best of both strategies and avoid adding another feature flag or configuration.

michael-s-molina avatar Sep 14 '22 11:09 michael-s-molina

Nice feature. Another case to consider is about color consistency. But this scenario is similar to the hidden tab, where the charts are lazy renderer, and this case will be addressed in another PR. cc @geido

Good catch @stephenLYZ. I think we can use the same approach we would use for the tabs. We would be catching series as they appear and either apply an existing color if that was seen before or assign a new color from the scheme and persist it.

geido avatar Sep 16 '22 11:09 geido

Love it! Just one preference from me, I think the options VIEWPORT and PAGINATED are not super clear at first glance. If we set the charts to load on scroll I think LAZY might be a more appropriate keyword.

The problem is that both are lazy 😜. Anyway, I vote for removing the flag/configuration completely and choosing the strategy based on the number of charts that need to be rendered.

michael-s-molina avatar Sep 16 '22 11:09 michael-s-molina

@kgabryje Is this safe to support the screen capture from "Reports" worker? I personally recommend using react-intersection-observer which can cover the fallback case of unsupported IntersectionObserver (which possibly issue on screen capture from the celery worker)

btw I recently introduced the similar feature (which only limits the runQuery) in #21488

justinpark avatar Sep 16 '22 18:09 justinpark

Thank you all for feedback and suggestions! @justinpark I'm not sure if delaying queries is a better solution. Queries are initialized in the order in which the charts are placed on the dashboard, so while we usually are exceeding the browser's simultaneous query limit, charts at the top of the page don't really need to wait for the charts at the bottom. Moreover, I've chosen to only delay drawing charts instead of drawing + running queries in order to reduce a possible "slow scroll" experience.

Great catch about the reports! Do you have any idea how to disable virtualization for celery workers?

@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards. WDYT?

@villebro Good point about the Pascal case! However, if we can figure out the dynamic approach proposed by Michael we won't need that config, so I'm not changing that for now 🙂

kgabryje avatar Sep 19 '22 16:09 kgabryje

@michael-s-molina Agreed, but we need to figure out the threshold between the 2 strategies. I was thinking about this: if the chart hasn't been in view yet, draw it when it's 1 view height away from the viewport (like we do now). If the chart has been drawn and viewed, remove it when it's 5 (arbitrarily chosen number) view heights away from the viewport. That way we won't remove charts in regular dashboards, but we will in big dashboards. WDYT?

Sounds good. We can determine the arbitrary number by displaying a heavy viewport of charts, measuring the consumed memory, and multiplying it by what we consider acceptable.

michael-s-molina avatar Sep 19 '22 16:09 michael-s-molina

IIRC, react-virtualized has been discontinued by the author in favor of continued support for react-window. So that's another reason to not use react-virtualized.

ktmud avatar Sep 20 '22 01:09 ktmud

@michael-s-molina @justinpark @geido @villebro @hushaoqing @rusackas @ktmud Thank you all for feedback and suggestions... and your patience. I updated the PR - now the charts render when they are 1vh away from the viewport, but they're removed from DOM only if they are more than 4vh away from viewport. See the updated summary for more details

kgabryje avatar Oct 09 '22 16:10 kgabryje

/testenv up FEATURE_DASHBOARD_VIRTUALIZATION

kgabryje avatar Oct 11 '22 06:10 kgabryje

@kgabryje Ephemeral environment spinning up at http://52.42.75.81:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

github-actions[bot] avatar Oct 11 '22 07:10 github-actions[bot]

Ephemeral environment shutdown and build artifacts deleted.

github-actions[bot] avatar Oct 11 '22 14:10 github-actions[bot]