Daft
Daft copied to clipboard
Tracking issues for Lance Integration
Is your feature request related to a problem?
Ensure the Lance integration in Daft is pefect and mature.
This issue is intended to track the overall progress of lance support in Daft, allowing our community to engage more easily.
Current Status
Basic Support includes python side:
daft.read_lance()daft.write_lance()
Current Ongoing work
- #4841
- #4842
- #4710
- #4490
Current blockers
- #4644
- #4643
Features in the Future
- lance index support
- vector search and full-text search for lance
- lance-namespace integration
- distributed merge_columns & update
Thanks for putting this together!
Thanks @Xuanwo! I believe @plotor is putting together some initial benchmarks for us to work off of for prioritization of these features
Thanks @Xuanwo! I believe @plotor is putting together some initial benchmarks for us to work off of for prioritization of these features @Xuanwo @jaychia I'm still dealing with benchmarking. I've been a bit busy recently, and I'll post it when I find some time.
update: https://github.com/Eventual-Inc/Daft/pull/4874 I've added some benchmarks here. Currently, they are all very simple test cases, and more will be added later. moreover, the pushdown logic hasn't been merged yet, so we need to take another look. @jaychia
cc @Xuanwo @universalmind303 @plotor
More Roadmap Tracking
| Status | Issue ID | Title | Technical Scope |
|---|---|---|---|
| :hammer: In Progress | #4842 | Refactor Lance Integration: Migrate from Python SDK to Rust SDK | Core Engine Optimization |
| :hammer: In Progress | #4899 | Pushdown Optimization: Implement COUNT Aggregation to Lance Data Source | Query Performance Enhancement |
| :eyes: Under Review | #4710 | Pushdown Optimization: Filter Predicate Execution at Storage Layer | Query Performance Enhancement |
| :bar_chart: Testing | #4874 | Performance Benchmark: DataFusion vs. DuckDB vs. Daft (Single-node & Distributed) | System Benchmarking |
| :calendar: Planned | #4904 | Query Optimization: Leverage Ordered Row IDs for Efficient LIMIT/OFFSET (Skip Sorting) | Query Acceleration |
| :calendar: Planned | #4905 | Multi-modal Join: Implement LookupJoin via Row ID Point Queries for Cross-Modal Data | Multi-modal Processing |
| :calendar: Planned | FEAT-07 |
Custom Task Framework: MERGE COLUMN, COMPACTION, DELETE, UPDATE, INSERT Operations | Data Operations Engine |
| :calendar: Planned | FEAT-08 |
Video Processing: Frame Extraction API for Multi-modal Pipelines | Multi-modal Extension |