Daft icon indicating copy to clipboard operation
Daft copied to clipboard

WIP: feat: implement Lance filter+count pushdown optimization

Open huleilei opened this issue 3 months ago • 2 comments

This commit implements filter+count joint pushdown optimization for Lance tables, significantly improving query performance for count queries with filter conditions.

Key changes:

  • Enhanced push_down_aggregation.rs to support filter+count joint pushdown
  • Improved lance_scan.py to handle filter+count operations natively
  • Updated _lancedb_count_result_function to process filters correctly
  • Enhanced test coverage for filter+count pushdown scenarios

The optimization maintains backward compatibility and includes graceful fallback mechanisms for unsupported filter expressions.

Changes Made

Related Issues

Checklist

  • [ ] Documented in API Docs (if applicable)
  • [ ] Documented in User Guide (if applicable)
  • [ ] If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

huleilei avatar Sep 07 '25 13:09 huleilei

Codecov Report

:x: Patch coverage is 97.18310% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 74.59%. Comparing base (1245eb5) to head (0e61b08). :warning: Report is 359 commits behind head on main.

Files with missing lines Patch % Lines
...an/src/optimization/rules/push_down_aggregation.rs 97.56% 3 Missing :warning:
daft/io/lance/lance_scan.py 92.30% 1 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5152      +/-   ##
==========================================
+ Coverage   73.69%   74.59%   +0.90%     
==========================================
  Files         969      969              
  Lines      126099   124370    -1729     
==========================================
- Hits        92926    92777     -149     
+ Misses      33173    31593    -1580     
Files with missing lines Coverage Δ
...rc/daft-logical-plan/src/optimization/optimizer.rs 94.00% <100.00%> (+0.82%) :arrow_up:
src/daft-micropartition/src/python.rs 66.96% <100.00%> (+0.06%) :arrow_up:
daft/io/lance/lance_scan.py 90.90% <92.30%> (+24.53%) :arrow_up:
...an/src/optimization/rules/push_down_aggregation.rs 96.35% <97.56%> (+0.80%) :arrow_up:

... and 31 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Sep 07 '25 13:09 codecov[bot]

@huleilei instead of resubmitting the PR, I think if you ping @greptileai, Greptile will review your PR again

kevinzwang avatar Sep 07 '25 20:09 kevinzwang

@rchowell @kevinzwang @Jay-ju help me review when you are convenient. Thanks

huleilei avatar Sep 13 '25 07:09 huleilei

@kevinzwang Tests are passing, help me merge when you are convenient. Thanks

huleilei avatar Sep 18 '25 05:09 huleilei

Merged. Thank you!

kevinzwang avatar Sep 18 '25 18:09 kevinzwang