Philipp Rudiger comments

Results 647 comments of


Philipp Rudiger

Large amount of time spent on determining datashape

>Now I'm hopelessly confused; dshape_from_dask isn't even involved in that if statement; it's called regardless of any branches in that. Yes, but that ``if`` branch creates a new dask dataframe...

Large amount of time spent on determining datashape

Here are my results (in ms): ``` if True: if False: if categorical_in_dtypes: census-non-cat: 831 840 840 404 413 450 852 833 872 census-cat: 947 968 966 504 471 485...

Large amount of time spent on determining datashape

Perhaps we should compare dask, pandas, numba and datashape versions: ``` import pandas as pd import dask import numba import datashape > print(numba.__version__) 0.37.0dev1+123.gd3f0f8a > print(dask.__version__) 0.18.0 > print(pd.__version__) 0.23.1...

Large amount of time spent on determining datashape

And here's the (truncated) output of ``%%prun``: ``` if True: ncalls tottime percall cumtime percall filename:lineno(function) 1 0.055 0.055 0.636 0.636 utils.py:368(dshape_from_dask) if False: ncalls tottime percall cumtime percall filename:lineno(function)...

Large amount of time spent on determining datashape

>Taking half a second to figure that out seems scandalous. Remember dask defers execution, I'm almost certain it's the dropping of columns that's causing this when ``df.head()`` is called.

Large amount of time spent on determining datashape

I just tested the example in #396 and my conclusion is that the optimization is consistently slower in every case when using dask, but does speed things up when using...

Improving Datashader's API

I do agree that the API can be streamlined but I'm not wholly convinced by this proposal. I think separating the canvas or scene from the glyph was a solid...

Improving Datashader's API

To me that is still more conceptually confusing since what you're calling now ``scene`` is really just the glyph with some default parameters tacked on. To put it another way,...

Improving Datashader's API

Sorry, I somehow completely missed your new proposal. That does sound reasonable although a concrete example of what you're thinking of would help.

Implement 1D aggregations

We could also consider offering a 1D KDE operation, but we should probably coordinate that with the "scumba" (SciPy/Numba) efforts. SciPy's KDE is horrendously slow (I realize it's an expensive...