Chen Chongchen

Results 27 comments of Chen Chongchen

@mikemccand I agree that separating refactoring code into another pull request is better. actually I tried it at the beginning. But I find that I have to pay much more...

> is SynonymGraphFilter also broken if a synonym contains a stopword? I think (not sure) the old SynonymFilter could handle this case? @mikemccand I tested some cases, the new code...

@janhoy I added more tests. I can add more, if you think it's necessary. I find that FlattenGraphFilter also removes holes. for example, the sentence 'the the usa', we filter...

> [@wjones127](https://github.com/wjones127) Yes, I basically got a bunch of GIL locks when those two threads try to access the lance dataset (not at the same time, just at all). lance...

> [@chenkovsky](https://github.com/chenkovsky) the "dataset" we're referring to is the `lance.dataset` object in memory, there's no pickling or unpickling that, it includes all that native Rust bindings. @oceanusxiv I think lance.dataset.LanceDataset...

I have a question, how to expose _rowid and _rowaddr, it seems that datafusion api and duckdb don't support these pseudo columns.

> You can't filter on the column cannt filter on rowid or any column ? I tested the following ut. ```python def test_duckdb_rowid(tmp_path): duckdb = pytest.importorskip("duckdb") tbl = create_table_for_duckdb() ds...

> impl LanceTableProvider { Yes, with_row_id, with_row_addr these flags will always work. but I think spark's SupportsMetadataColumns interface is much better.

I created a PR for datafusion to illustrate my idea for _rowid support https://github.com/apache/datafusion/pull/14057