lens icon indicating copy to clipboard operation
lens copied to clipboard

Summarise and explore Pandas DataFrames

Results 16 lens issues
Sort by recently updated
recently updated
newest added

PR raised in response to #44 > Prior to version 4, this library could operate in either an "online" or "offline" mode. The documentation tended to emphasize the online mode,...

the dataFrame method "get_values" doesn't exist any more I downgraded pandas to '0.25.0' to make it work. the current [setup.py](https://github.com/facultyai/lens/blob/master/setup.py) requires pandas but doesn't specify a version.

it gives me the error > error: Microsoft Visual C++ 14.0 is required. Get it with > "Build Tools for Visual Studio": https://visualstudio.microsoft.com/downloads/ > ---------------------------------------- > ERROR: Command errored out...

Hi guys, getting the following error with the explorer module: ``` AttributeError Traceback (most recent call last) in ----> 1 explorer.correlation_plot() ~/machine_learning/.env/lib/python3.7/site-packages/lens/explorer.py in correlation_plot(self, include, exclude) 311 """ 312 fig...

This is work in progress: DO NOT MERGE This PR adapts the summarise functionality to be able to take a [dask dataframe](https://dask.pydata.org/en/latest/dataframe.html), which will allow to take in larger-than-memory datasets...

Plotly has the advantage of resulting in interactive plots in a jupyter notebook, but it is does not result in easily portable plots. We should consider ways of making the...

enhancement
good first issue

Right now the [t-digest](https://github.com/tdunning/t-digest) computation (done using a [python t-digest implementation](https://github.com/CamDavidsonPilon/tdigest)) takes most of the time in generating a summary. The initial motivation to include it was for it to...

enhancement
discussion
good first issue

The dask distributed scheduler is generally an improvement over the multiprocessing scheduler even in individual multicore machines because of its improved awareness of data locality, so we should consider adding...

feature
good first issue

For large datasets where computing the summary may be expensive, it would be useful to compute only part of it, be able to explore it, and then compute other parts...

feature
discussion
good first issue