Prashant Singh

Results 56 comments of Prashant Singh

> Could you enable spark.sql.cbo.enabled to estimate row count? Thanks @wangyum, I am aware of the alternate visitor we use with cbo. I raised this pr considering : 1. cbo...

rebased and regenerated the golden files via : * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *PlanStability*Suite" * SPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_MODE=true build/sbt "sql/testOnly *PlanStability*Suite"

Thanks @wangyum ! > So enabling spark.sql.cbo.enabled is what you want? I believe then setting `spark.sql.cbo.enabled` to true by default could help, (what i wanted was to take this stat...

> After this PR, what's the difference between SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor BasicStatsPlanVisitor additionally takes has columnStats such as (NDV / NullCount / min / max etc) on estimation, which generally...

> BTW, with CBO off, where do we use row count? we use it in places like : https://github.com/apache/spark/blob/161c596cafea9c235b5c918d8999c085401d73a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala#L93-L100 where we just multiply row-count with row size. We also use...

> is this something that people are still open to working on? +1, I have a PR out for supporting rate limiting in Spark 3 : * https://github.com/apache/iceberg/issues/2789 * https://github.com/apache/iceberg/pull/4479...

@asheeshgarg, will [s3 access-points for iceberg](https://iceberg.apache.org/docs/latest/aws/#s3-access-points), work for your use case ?

@asheeshgarg The metadata files will still be pointing to my-bucket1 (actual s3 path) but while making s3 request via Iceberg (GET + PUT) the [my-bucket1 path will be replaced by...

yes, if you map both the bucket (present in different region) to a multi-region access-point. can ref to this slack thread as well, where this idea originated : https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1645066803099319 >...