db-benchmark Add Pyspark.pandas to benchmark

Add Pyspark.pandas to benchmark

Open TjommeVergauwen opened this issue 3 years ago • 1 comments

trafficstars

Are there any plans to add Pyspark.pandas to the benchmark?

Apr 02 '22 11:04 TjommeVergauwen

Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower? I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.

Apr 02 '22 12:04 jangorecki

db-benchmark db-benchmark copied to clipboard

Add Pyspark.pandas to benchmark

db-benchmark
db-benchmark copied to clipboard