lux Redundant data in timeseries analysis

Redundant data in timeseries analysis

Open AdityaR-Bits opened this issue 1 year ago • 0 comments

When running the analysis on the NYC Taxi dataset, I found that the JSON spec created using Altair backend was storing most of the data (about more than 90%) for a single timeseries temporal plot, where each datapoint in the JSON was for a very short time over a vast range. This plot took a lot of time to render when timed separately. The other recommended plots had performed binning (monthly, yearly, or day of the week) and so where very fast. In such cases where a single plot is taking majority of the time, we could possibly give the user an option to render or skip such a chart?

To Reproduce

lux.config.sampling = False
lux.config.default_display = "lux"
df = pd.read_csv("./data/nyc_taxi.csv")
df['tpep_pickup_datetime'] = pd.to_datetime(df.tpep_pickup_datetime, format="%Y-%m-%d")
df['tpep_dropoff_datetime'] = pd.to_datetime(df.tpep_dropoff_datetime, format="%Y-%m-%d")
df

This is the graph in particular plot

Jul 16 '22 01:07 AdityaR-Bits

lux lux copied to clipboard

Redundant data in timeseries analysis

lux
lux copied to clipboard