pyprql
pyprql copied to clipboard
`duckdb.sql(prql.compile())` v.s. `df.prql.query()`?
They work almost identically for pandas.DataFrame, and the former would work for polars.DataFrame and pyarrow.Table.
import duckdb
import polars as pl
import prql_python as prql
df = pl.DataFrame({'a': 42})
opts = prql.CompileOptions(target="sql.duckdb")
duckdb.sql(prql.compile("from df", options=opts)).pl()
Probably needs to be mentioned somewhere... (Related to #151)
Great, to confirm — do you mean we should mention this as an option in the docs? Or we should use duckdb.sql to do our pandas querying?
I intended to update only the documentation for now.
But I think it is worth creating a new function based on duckdb.sql and replacing df.prql.query, since they provide almost the same functionality.
(I don't know what name that function should have... pyprql.duckdb_query?)
IIUC, the current df.prql.query is based on duckdb (https://github.com/PRQL/pyprql/blob/393bc65690fc4e31d863708e1564a68225c7624d/pyprql/pandas_accessor/prql.py#L30).
The accessor offers a method on a DataFrame, which is often more convenient than running duckdb.sql(prql.compile..., even if it's a similar functionality.
Does this make sense or am I misunderstanding?
I agree that the df.prql.query is more convenient and we should keep it.
Do we need the .query part? Could we shorten this to just df.prql(...)?
Are there any other members of df.prql?
Yes, I agree that methods are sometimes more convenient. So I intended to only update the documentation at this time.
Could we shorten this to just
df.prql(...)?
That does not seem to be allowed. https://pandas.pydata.org/docs/reference/api/pandas.api.extensions.register_dataframe_accessor.html