Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Tracking issues for Lance Integration

Open Xuanwo opened this issue 4 months ago • 5 comments

Is your feature request related to a problem?

Ensure the Lance integration in Daft is pefect and mature.

This issue is intended to track the overall progress of lance support in Daft, allowing our community to engage more easily.

Current Status

Basic Support includes python side:

  • daft.read_lance()
  • daft.write_lance()

Current Ongoing work

  • #4841
    • #4842
  • #4710
  • #4490

Current blockers

  • #4644
  • #4643

Features in the Future

  • lance index support
  • vector search and full-text search for lance
  • lance-namespace integration
  • distributed merge_columns & update

Xuanwo avatar Jul 28 '25 08:07 Xuanwo

Thanks for putting this together!

universalmind303 avatar Jul 28 '25 15:07 universalmind303

Thanks @Xuanwo! I believe @plotor is putting together some initial benchmarks for us to work off of for prioritization of these features

jaychia avatar Jul 28 '25 16:07 jaychia

Thanks @Xuanwo! I believe @plotor is putting together some initial benchmarks for us to work off of for prioritization of these features @Xuanwo @jaychia I'm still dealing with benchmarking. I've been a bit busy recently, and I'll post it when I find some time.

Jay-ju avatar Jul 29 '25 01:07 Jay-ju

update: https://github.com/Eventual-Inc/Daft/pull/4874 I've added some benchmarks here. Currently, they are all very simple test cases, and more will be added later. moreover, the pushdown logic hasn't been merged yet, so we need to take another look. @jaychia

cc @Xuanwo @universalmind303 @plotor

Jay-ju avatar Jul 30 '25 13:07 Jay-ju

More Roadmap Tracking

Status Issue ID Title Technical Scope
:hammer: In Progress #4842 Refactor Lance Integration: Migrate from Python SDK to Rust SDK Core Engine Optimization
:hammer: In Progress #4899 Pushdown Optimization: Implement COUNT Aggregation to Lance Data Source Query Performance Enhancement
:eyes: Under Review #4710 Pushdown Optimization: Filter Predicate Execution at Storage Layer Query Performance Enhancement
:bar_chart: Testing #4874 Performance Benchmark: DataFusion vs. DuckDB vs. Daft (Single-node & Distributed) System Benchmarking
:calendar: Planned #4904 Query Optimization: Leverage Ordered Row IDs for Efficient LIMIT/OFFSET (Skip Sorting) Query Acceleration
:calendar: Planned #4905 Multi-modal Join: Implement LookupJoin via Row ID Point Queries for Cross-Modal Data Multi-modal Processing
:calendar: Planned FEAT-07 Custom Task Framework: MERGE COLUMN, COMPACTION, DELETE, UPDATE, INSERT Operations Data Operations Engine
:calendar: Planned FEAT-08 Video Processing: Frame Extraction API for Multi-modal Pipelines Multi-modal Extension

Jay-ju avatar Aug 05 '25 02:08 Jay-ju