Andrew Lamb
Andrew Lamb
Nice @zhuqi-lucas -- BTW I am not sure how easy it will be to use the parquet APIs to do this (specifically write arbitrary bytes to the inner writer) so...
> Hi [@zhuqi-lucas](https://github.com/zhuqi-lucas), > > While proofreading the blog, I had one major general question: **What are the limitations of such an embedded index?** > > * Is it limited...
> Thank you [@alamb](https://github.com/alamb) [@JigaoLuo](https://github.com/JigaoLuo) [@adriangb](https://github.com/adriangb) , i agree current example is the start, we can further add more advanced examples! I also made a PR to clarify the comments...
> User-Defined Index. I think this is a really good term -- I will update the blog post in https://github.com/apache/datafusion-site/pull/79 to use that
๐ค `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing default_utf8_for_unkown_type (7586ca2b84e3312f08281d27d4aa25a5c6eba339) to 5b08b843cc52fc843ef54063bebd4c30b9b0f3a0 [diff](https://github.com/apache/datafusion/compare/5b08b843cc52fc843ef54063bebd4c30b9b0f3a0..7586ca2b84e3312f08281d27d4aa25a5c6eba339) Benchmarks: h2o_small_window Results will...
๐ค: Benchmark completed Details ``` Comparing HEAD and default_utf8_for_unkown_type -------------------- Benchmark h2o_window.json -------------------- โโโโโโโโโโโโโโโโณโโโโโโโโโโโโโณโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโณโโโโโโโโโโโโโโโ โ Query โ HEAD โ default_utf8_for_unkown_type โ Change โ โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ โ QQuery 1 โ 1907.12ms โ...
> It looks like no performance improvement for h2o_window benchmark result... Now that I think about it, the h2o benchmark may not have any string columns ๐ค Do the TPCH...
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please...
Here is an alternate implementation proposed by @jonahgao : https://github.com/apache/datafusion/pull/12341
I plan to check this out shortly -- thanks @notfilippo