Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Daft supports the distributed merge_columns/create_index of Lance

Open Jay-ju opened this issue 5 months ago • 2 comments

Is your feature request related to a problem?

Currently, daft only supports distributed read and write of lance. The merge_columns feature of lance itself is very useful for scenarios of adding columns. Here, it should be emphasized that the add_column mentioned here is not the with_columns in daft. with_columns is an in-memory operation, while add_column is persistently bound to the lance object.

I would like to propose the addition of this feature here. Preliminarily, there are several ideas as follows:

  1. Implement merge_column as a type of write_lance, write_lance(operation = [append/create/merge]). If it is merge, compare the differences between the schema in the dataframe and the schema in the lance dataset. The newly added columns will be distributed in the form of merge_column. This approach seems a bit strange semantically, but lance itself can uniformly commit operations such as append/create/merge/update

  2. Add a task framework, similar to supporting a fixed workflow, encapsulating several execution paradigms. However, the prerequisite here is to support operators like map/map_batches.

merge_column template

ds = daft.from_list([lance.fragment_id)] )
ds = ds.map(v -> process(fid -> merge_column))
ds.collect()

create_index template

ds = daft.from_list([lance.fragment_id)] )
ds = ds.map(v -> process(fid -> create_index))
ds.collect()

Describe the solution you'd like

as above

Describe alternatives you've considered

No response

Additional Context

No response

Would you like to implement a fix?

No

Jay-ju avatar Jun 30 '25 13:06 Jay-ju

Sounds interesting, would def be something we would be looking help for!

srilman avatar Jun 30 '25 17:06 srilman

Sounds interesting, would def be something we would be looking help for!

@srilman I really want to hear your suggestions here. This is also what I've been doing recently.

Jay-ju avatar Jul 01 '25 01:07 Jay-ju