Felipe Pessoto
Felipe Pessoto
@zsxwing, you assigned @vkorukanti, it means you plan to implement it?
@vkorukanti do you have any example of code where the query plan is replaced by a optimized version? I think it would be a good start point.
Hi @vkorukanti, I'm doing some experiments and I have two different approaches (it is very high level only, I not sure if they are feasible), I'd like to hear your...
I started working on option #1. Have a PoC working
@scottsand-db do you have any updates on this? Is it expected for the next release? Thanks
I need to test it. In my experiments with Parquet and Delta, the ANALYZE TABLE made the queries ~40% faster than both Parquet (without ANALYZE TABLE) and Delta.
BTW, you mean Delta 1.2? In 1.1 changelog I don't see these changes
@scottsand-db in my test with 1.2 it didn't improve performance. Looking the query plan, they are the same as 1.1, except by PreparedDeltaFileIndex instead of TahoeLogFileIndex. Stats are expected to...
**UPDATE**: I found the stats (min, max, null count) in delta log, but not sure why they are not being used during query Yes, I regenerate it. Do you know...
Some more questions, please: 1. Would be correct to say that Delta stats are file-wise while ANALYZE are table-wise? 2. ANALYZE is a Spark feature, while Delta stats is part...