Andrew Lamb
Andrew Lamb
TLDR is while it is possible to implement the async UDFs using existing DataFusion APIs (no changes to the core), I think it is a sufficiently asked for, useful and...
> My use of async in udf's currently is to query either an external system or datafusion itself. That is interesting, it almost sounds like you are using async udfs...
Are there any remaining outstanding issues to merging this PR? If not, perhaps we can merge it and file an epic / ticket for filling out the remaining features. A...
Unless I hear anything else I plan to merge this tomorrow and will file a follow on Epic for other tasks (docs / blogs / support in other types of...
🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing epic/async-udf (2462dd0f9439c9c7d7104e8e168d2b64b597c473) to deeff88601772165615d04bbe5f0ea31ce1e8112 [diff](https://github.com/apache/datafusion/compare/deeff88601772165615d04bbe5f0ea31ce1e8112..2462dd0f9439c9c7d7104e8e168d2b64b597c473) BENCH_NAME=sql_planner BENCH_COMMAND=cargo bench --bench...
🤖: Benchmark completed Details ``` group epic_async-udf main ----- -------------- ---- logical_aggregate_with_join 1.00 717.5±3.09µs ? ?/sec 1.02 732.1±3.07µs ? ?/sec logical_select_all_from_1000 1.03 127.2±0.25ms ? ?/sec 1.00 123.9±0.26ms ? ?/sec logical_select_one_from_700...
Added to list on https://github.com/apache/datafusion/issues/16235
- See also https://github.com/apache/parquet-format/issues/489#issuecomment-2833438755
BTW it is possible today to add user defined indexes for this usecase You can put such indices in Parquet files, as described here - https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/ Or you can store...
I plan to review this tomorrow