db-benchmark
db-benchmark copied to clipboard
Add Pyspark.pandas to benchmark
trafficstars
Are there any plans to add Pyspark.pandas to the benchmark?
Do you expect to have different performance than pyspark.sql? Do you think it will be faster or slower? I think it make sense to keep only one of them rather than maintaining both. Running each solution costs couple hours of high spec machine, so I would avoid benchmarking spark interfaces (SQL/pandas) and focus on the engine. I am sure they share the same spark engine.