prql-query icon indicating copy to clipboard operation
prql-query copied to clipboard

prql-query 0.0.14 doesn't build with duckdb 0.6.0

Open richb-hanover opened this issue 1 year ago • 9 comments

duckdb just released version 0.6.0 that purports to fix #18, the "... got something else..." error message.

To reproduce, update the Cargo.toml to use duckdb version=0.6.0, then cargo build. Here are the build messages:

?1 prql-query % cargo build
    Updating crates.io index
  Downloaded arrow-schema v26.0.0
  Downloaded arrow-array v26.0.0
  Downloaded arrow-select v26.0.0
  Downloaded arrow-buffer v26.0.0
  Downloaded arrow-data v26.0.0
  Downloaded bindgen v0.61.0
  Downloaded duckdb v0.6.0
  Downloaded arrow v26.0.0
  Downloaded libduckdb-sys v0.6.0
  Downloaded 9 crates (3.3 MB) in 1.23s (largest was `libduckdb-sys` at 2.5 MB)
   Compiling bindgen v0.61.0
   Compiling arrow-schema v26.0.0
   Compiling arrow-buffer v26.0.0
   Compiling arrow v23.0.0
   Compiling arrow-data v26.0.0
   Compiling arrow-array v26.0.0
   Compiling arrow-select v26.0.0
   Compiling arrow v26.0.0
   Compiling libduckdb-sys v0.6.0
   Compiling parquet v23.0.0
   Compiling duckdb v0.6.0
   Compiling prql-query v0.0.14 (/Users/richb/github/prql-query)
error[E0277]: a value of type `Vec<arrow::record_batch::RecordBatch>` cannot be built from an iterator over elements of type `duckdb::arrow::record_batch::RecordBatch`
    --> src/backends/duckdb.rs:133:37
     |
133  |     let rbs = stmt.query_arrow([])?.collect::<Vec<RecordBatch>>();
     |                                     ^^^^^^^ value of type `Vec<arrow::record_batch::RecordBatch>` cannot be built from `std::iter::Iterator<Item=duckdb::arrow::record_batch::RecordBatch>`
     |
     = help: the trait `FromIterator<duckdb::arrow::record_batch::RecordBatch>` is not implemented for `Vec<arrow::record_batch::RecordBatch>`
     = help: the trait `FromIterator<T>` is implemented for `Vec<T>`
note: required by a bound in `std::iter::Iterator::collect`
    --> /Users/richb/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:1832:19
     |
1832 |     fn collect<B: FromIterator<Self::Item>>(self) -> B
     |                   ^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `std::iter::Iterator::collect`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `prql-query` due to previous error
?101 prql-query %

richb-hanover avatar Nov 15 '22 21:11 richb-hanover

Hi, thanks for the report.

Haven't got to do the version bump myself yet. You'll usually have to bump the arrow and parquet versions to the same number that duckdb-rs uses. Looking at the docs.rs/duckdb site it looks like it references v26.

So in your Cargo.toml, bump the arrow and parquet versions from 23 to 26 and see if that fixes it.

I'll look to push out another release soon.

snth avatar Nov 16 '22 12:11 snth

I can confirm that changing arrow and parquet to "26.0.0" makes both cargo build and the Dockerfile work fine. (And I get the improved error message from duckdb / sqlite_scanner... See https://github.com/duckdblabs/sqlite_scanner/issues/15#issuecomment-1316982990)

Question: Is setting those versions to "26.0.0" overspecifying? Should I simply set them to "26"? Thanks.

richb-hanover avatar Nov 16 '22 13:11 richb-hanover

I think you're right. Just 26 should be fine.

duckdb-rs actually made a change which means that we can probably get rid of the arrow dependency once I do some refactoring. Unfortunately we're stuck with the parquet one for now.

snth avatar Nov 16 '22 13:11 snth

Thanks - I have done it in my own repo. I don't think I need to send a PR.

  • The current 0.0.14 works just fine (except for the suboptimal error message.)
  • I can keep working with my updated copy;
  • You'll update those versions when you next work on prql-query.

richb-hanover avatar Nov 16 '22 13:11 richb-hanover

Hi. First of all, thanks so much for the excellent project! Huge fan of PRQL! Greatly appreciate all the work and effort that goes into the project.

Recently, I noticed that prql-query is using duckdb "0.5.1" (libduckdb-sys v0.5.1) as a dependency. Will the next version of prql-query include an updated version of duckdb?

Thanks again!

earlev4 avatar Mar 07 '23 16:03 earlev4

Gosh, I can't believe how time has flown and how long this issue has been open. My apologies for not having addressed this yet. The next version of prql-query should certainly include an updated version of DuckDB, the latest being 0.7.1 as of this writing.

In the meantime, if you can use docker rather than a native binary, you could try the following project which I threw together this weekend on the back of the announcement of the duckdb-prql extension.

https://github.com/duckerlabs/ducker

Please let me know if that works for you.

I will try to update prql-query soon as there is an exciting development in the duckdb-rs repo for creating a duckdb binary with the extensions statically compiled in which I would like to incorporate into prql-query as well.

snth avatar Mar 07 '23 22:03 snth

Thanks so much, @snth! I will check out the Ducker! Looks like an interesting project. Also, looking forward to the prql-query update! Thanks for the prompt response and all your efforts on PRQL! It is an awesome project and I appreciate it! 😄

earlev4 avatar Mar 07 '23 23:03 earlev4

The Ducker container looks terrific, and I'll check it out. Unless there's something else in this issue, I think you can close it. Thanks!

richb-hanover avatar Mar 08 '23 12:03 richb-hanover

Wondering if it is possible to remove duckdb-rs from prql-query in favor of the DuckDB's PRQL extension. DuckDB is great, but is it an impediment to building prql-query because of the machine power and time required to build it compared to datafusion here?

eitsupi avatar Mar 14 '23 15:03 eitsupi