Prashant Singh comments

Results 56 comments of


                                            Prashant Singh

[SPARK-39678][SQL] Improve stats estimation for v2 tables

cc @huaxingao @cloud-fan @wangyum

[SPARK-39678][SQL] Improve stats estimation for v2 tables

> Could you enable spark.sql.cbo.enabled to estimate row count? Thanks @wangyum, I am aware of the alternate visitor we use with cbo. I raised this pr considering : 1. cbo...

[SPARK-39678][SQL] Improve stats estimation for v2 tables

rebased and regenerated the golden files via : * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *PlanStability*Suite" * SPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly *PlanStability*Suite"

[SPARK-39678][SQL] Improve stats estimation for v2 tables

Thanks @wangyum ! > So enabling spark.sql.cbo.enabled is what you want? I believe then setting `spark.sql.cbo.enabled` to true by default could help, (what i wanted was to take this stat...

[SPARK-39678][SQL] Improve stats estimation for v2 tables

> After this PR, what's the difference between SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor BasicStatsPlanVisitor additionally takes has columnStats such as (NDV / NullCount / min / max etc) on estimation, which generally...

[SPARK-39678][SQL] Improve stats estimation for v2 tables

> BTW, with CBO off, where do we use row count? we use it in places like : https://github.com/apache/spark/blob/161c596cafea9c235b5c918d8999c085401d73a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L93-L100 where we just multiply row-count with row size. We also use...

Prashant Singh

[SPARK-39678][SQL] Improve stats estimation for v2 tables

[SPARK-39678][SQL] Improve stats estimation for v2 tables

[SPARK-39678][SQL] Improve stats estimation for v2 tables

[SPARK-39678][SQL] Improve stats estimation for v2 tables

[SPARK-39678][SQL] Improve stats estimation for v2 tables

[SPARK-39678][SQL] Improve stats estimation for v2 tables

Support structured streaming read for Iceberg

Update Iceberg metadata in case of DR

Update Iceberg metadata in case of DR

Update Iceberg metadata in case of DR