db-benchmark icon indicating copy to clipboard operation
db-benchmark copied to clipboard

Add Pyspark.pandas to benchmark

Open TjommeVergauwen opened this issue 3 years ago • 1 comments
trafficstars

Are there any plans to add Pyspark.pandas to the benchmark?

TjommeVergauwen avatar Apr 02 '22 11:04 TjommeVergauwen

Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower? I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.

jangorecki avatar Apr 02 '22 12:04 jangorecki