duckdb-rs
duckdb-rs copied to clipboard
Quite long compilation time
Thanks for the excellent crate! We're using it in https://github.com/prql/prql for integration tests.
Something I wanted to highlight was the compilation time of the crate. At the moment, it's responsible for about 60% of our whole project's compilation time.
For a concrete case, check out the build timings here: https://github.com/prql/prql/suites/7516963360/artifacts/309340433 (just added in https://github.com/prql/prql/pull/856 !).
Here's a screenshot:

Thanks!
Thanks for the report. I did a quick check, one of the main reason is arrow crate takes too much time to build. It may not easy to optimize this, one option is switch to https://github.com/jorgecarleitao/arrow2 but I'm also not sure if that implementation is faster or not.
And I also noticed you use bundled feature of this crate, it will compile the c++ source code and also might takes too much time. I may suggest you to download the binary follow what I did in here https://github.com/wangfenjin/duckdb-rs/blob/main/.github/workflows/rust.yaml#L37-L51
Asked in https://github.com/apache/arrow-rs/issues/2170
Thanks a lot for looking into this @wangfenjin !
To update on this — I tried to understand why our cached build was taking so much time, given that duckdb-rs would have already been compiled. That's the most important metric for us, and that's the one that determines how long our CI takes.
I don't have a great answer, though this workflow shows that much of the time is coming from the prql-compiler integration test compilation, which is the one that uses duckdb-sys, and is otherwise very small. We would need to do deeper to confirm the cause (I'm not yet familiar in how to do that)
With bundled flag, the duckdb 0.6.1 compile time is now over 4 minutes with my M1 Pro Macbook. I wonder if the recent upstream updates on the main branch has brought any significant improvements on the compile-time aspect?