SandDance icon indicating copy to clipboard operation
SandDance copied to clipboard

Optimize performance for large datasets

Open drewkerwin opened this issue 5 years ago • 2 comments

I am trying to analyze a 500MB csv file and rendering time takes minutes on a strong Windows machine. Sometimes the plot will render the first time in a few minutes, but never again (if I change the x-axis for example). Is it possible to launch this tool without analyzing/rendering by default? That way I can choose my options and then render once? Also, is anyone working on improving the performance of this tool?

drewkerwin avatar Oct 26 '20 18:10 drewkerwin

Hi @drewkerwin, thanks for the feedback. When creating a chart, we create a dependency graph of all the variables used to specify the layout. We keep this graph in memory to facilitate a more responsive interaction when a user changes a slider for example, we don't recompute the entire layout, just the parts that change based on the slider value: image As you've noticed, this optimization becomes a liability for large datasets, as it consumes more resources. In these cases, we would need to opt to degrade interactivity, and recompute the entire layout.

danmarshall avatar Oct 30 '20 19:10 danmarshall

Thank you @danmarshall, also note that a little pre-processing in python to reduce the size of the CSV help a great deal...e.g. extracting only the relevant columns into a modified CSV.

drewkerwin avatar Oct 30 '20 21:10 drewkerwin