superset icon indicating copy to clipboard operation
superset copied to clipboard

Rolling mean on resampled data produces incorrect graph

Open matthew-at-qamcom opened this issue 2 years ago • 5 comments

I cannot correctly graph a rolling average on resampled data.

How to reproduce the bug

  1. Add this CSV file as a dataset: demo.csv
  2. Create a "Time-series Line Chart" based on the dataset provided
  3. Set the metric to be "AVG(value)"
  4. At this stage, if you click "Update chart" you'll see a straight line (y=5). Note, for example, there is no value for 2000-01-03, as expected.
  5. Open "Advanced Analytics"
  6. From the resampling rules, select "1 calendar day frequency"
  7. From fill method, select "Zero imputation" (or "Sum values", they both give the same outcome)
  8. If you update the chart now, you will see many days with zero values. The line is no longer the simple y=5. This is as expected.
  9. Select "mean" from as the rolling window function.
  10. Set period and min periods to, say, 5.
  11. Update the chart
  12. Note that graph is not a smooth curve, but rather has values at y=5 and y=0:

Expected results

I expected to see a smooth curve, with values between zero and 5, similar to: image

Actual results

We see values at y=5 and y=0, not the values that would be expected from a rolling mean on resampled data: image

Environment

  • browser type and version: Firefox 109.0.1
  • superset version: 0.0.0-dev. I've also tried this on Superset 2.3
  • python version: 3.8.13

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • [ x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [ x ] I have reproduced the issue with at least the latest released version of superset.
  • [ x ] I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

I'm using the apache/superset Docker images.

matthew-at-qamcom avatar Feb 14 '23 01:02 matthew-at-qamcom