big_data_benchmarks
big_data_benchmarks copied to clipboard

Published 20 hours ago •

→

Metadata

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

Reame
Issues

Results 3 big_data_benchmarks issues

Sort by recently updated

Issue with spark-koalas.ipynb

Wouldn't the code below at spark-koalas.ipynb cause self-recursion? ``` def filter_data(df): filtered = df[expr_filter] p = filter_data(data).to_pandas() del p return filtered ```

xinrong-meng

Add notebooks/build_parquet_dask.ipynb

Example notebook to demonstrate writing a parquet file with small partitions. This will require some minor changes to work in an AWS environment.

gwvr

Only aggregations

1

This is a bit of a better comparison and makes dask run. Instead of materializing a column, we nog aggregate (take the mean). And we don't ask dask to materialize...

maartenbreddels

About

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

65

Stars

21

Forks

Watchers

Owner

xdssio

← Metadata

65

Stars

21

Forks

Watchers

Owner

xdssio

Metadata

big data technologies comparisons for cleaning, manipulating and generally wrangling data in purpose of analysis and machine learning.

Back

big_data_benchmarks big_data_benchmarks copied to clipboard

Metadata

Issue with spark-koalas.ipynb

Add notebooks/build_parquet_dask.ipynb

Only aggregations

← Metadata

Owner

Metadata

big_data_benchmarks
big_data_benchmarks copied to clipboard