lance icon indicating copy to clipboard operation
lance copied to clipboard

filtering on timestamp column not working when tried with duckdb

Open rkunnamp opened this issue 1 year ago • 2 comments

Tried filtering on a timestamp column using duckdb against lance, and it did not work. The same query worked against a parquet file.


import pyarrow as pa
import lancedb
import duckdb
from datetime import datetime, timedelta

data = pa.table({
                  "c1": ["A", "B"],
                  "c2": ["a", "b"],
                  "c3": [1, 2],
                  "c4": [datetime(2021,6,16),datetime(2021,6,17)]
               })



db = lancedb.connect("lancedb_files")
db.create_table("lance_table", data)
table = db.open_table("lance_table")
arrow_dataset = table.to_lance()

duckdb.query("select * from arrow_dataset where c4='2021-06-16 00:00:00.000000'").df()
index c1 c2 c3 c4
pa.dataset.write_dataset(data, "parquet_files", format='parquet')
duckdb.query("select * from 'parquet_files/*.parquet' where c4='2021-06-16 00:00:00.000000'").df()
index c1 c2 c3 c4
0 A a 1 2021-06-16 00:00:00
arrow_dataset.schema

c1: string c2: string c3: int64 c4: timestamp[us]


Not sure if this is a bug that is to be reported here, or if it should be filed with duckdb .

rkunnamp avatar Jun 23 '24 00:06 rkunnamp

I can confirm I can repro when querying lance using duckdb. I have also confirmed that lance timestamp filter is working correctly

In [11]: arrow_dataset.scanner(filter="c4=timestamp '2021-06-16 00:00:00.000000'").to_table().to_pandas()
Out[11]:
  c1 c2  c3         c4
0  A  a   1 2021-06-16

In [12]: duckdb.query("select * from arrow_dataset where c4=timestamp '2021-06-16 00:00:00.000000'").df()
Out[12]:
Empty DataFrame
Columns: [c1, c2, c3, c4]
Index: []

the issue is most likely somewhere in how we translate filters from duckdb into lance filters.

chebbyChefNEQ avatar Jun 25 '24 16:06 chebbyChefNEQ

Yeah seems like an issue with how the substrait translation is happening @westonpace

wjones127 avatar Jun 25 '24 16:06 wjones127