big_data_benchmarks icon indicating copy to clipboard operation
big_data_benchmarks copied to clipboard

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

Results 3 big_data_benchmarks issues
Sort by recently updated
recently updated
newest added

Wouldn't the code below at spark-koalas.ipynb cause self-recursion? ``` def filter_data(df): filtered = df[expr_filter] p = filter_data(data).to_pandas() del p return filtered ```

Example notebook to demonstrate writing a parquet file with small partitions. This will require some minor changes to work in an AWS environment.

This is a bit of a better comparison and makes dask run. Instead of materializing a column, we nog aggregate (take the mean). And we don't ask dask to materialize...