feat: add forward and backward strategies for fill_null
Changes Made
Add forward and backward strategies for fill_null. Kept backwards compatibility for the existing fill_null by either passing no strategy or strategy == 'value'.
Related Issues
Closes #4683
Checklist
- [x] Documented in API Docs (if applicable)
- [x] Documented in User Guide (if applicable)
- [x] If adding a new documentation page, doc is added to
docs/mkdocs.ymlnavigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)
Two failing checks are failing on main prior to this PR.
Codecov Report
:x: Patch coverage is 78.02198% with 60 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 78.76%. Comparing base (87cab23) to head (c145ad2).
:warning: Report is 512 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4776 +/- ##
==========================================
- Coverage 78.90% 78.76% -0.15%
==========================================
Files 886 886
Lines 123324 124296 +972
==========================================
+ Hits 97313 97901 +588
- Misses 26011 26395 +384
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/daft-core/src/lib.rs | 100.00% <100.00%> (ø) |
|
| src/daft-core/src/python/series.rs | 90.47% <100.00%> (+0.37%) |
:arrow_up: |
| src/daft-dsl/src/python.rs | 79.38% <100.00%> (+0.45%) |
:arrow_up: |
| daft/expressions/expressions.py | 96.32% <94.11%> (-0.06%) |
:arrow_down: |
| src/daft-recordbatch/src/lib.rs | 85.76% <88.88%> (-0.02%) |
:arrow_down: |
| daft/series.py | 91.86% <85.71%> (-0.26%) |
:arrow_down: |
| src/daft-dsl/src/expr/mod.rs | 79.53% <92.50%> (+0.58%) |
:arrow_up: |
| src/daft-dsl/src/visitor.rs | 56.22% <0.00%> (-0.49%) |
:arrow_down: |
| src/daft-core/src/series/ops/null.rs | 92.37% <92.52%> (-7.63%) |
:arrow_down: |
| src/daft-logical-plan/src/ops/project.rs | 58.17% <0.00%> (-0.11%) |
:arrow_down: |
| ... and 3 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Hi @nsalerni, thank you for contributing this! I took a quick look at this PR, the main issue is that this implementation is not fully correct, as it only fills nulls within a RecordBatch, and does not carry them over.
For example, if we set the RecordBatch sizes to 1 using
default_morsel_size, you will see this behavior:>>> import daft >>> daft.context.set_execution_config(default_morsel_size=1) DaftContext(_ctx=<builtins.PyDaftContext object at 0x102b7a5f0>) >>> df = daft.from_pydict({"a": [1, None]}) >>> df.select(df["a"].fill_null(strategy="forward")).show() ╭───────╮ │ a │ │ --- │ │ Int64 │ ╞═══════╡ │ 1 │ ├╌╌╌╌╌╌╌┤ │ None │ ╰───────╯ (Showing first 2 of 2 rows)Fixing this is not trivial and requires building up mechanisms in our physical pipelines. This is actually something we also want to build for our non-partitioned window functions as well, and would be a significant enhancement to Daft! So if you are up to it, let's talk more. I am happy to help guide you
I have some other comments too but they are minor and I will leave them for a more thorough review.
@kevinzwang happy to take a first go, let me ping you in the daft slack ! 😄
Hi @nsalerni 👋 Thanks for this contribution! Just checking in here -- how is it going addressing the review feedback?
Hi @nsalerni - Checking to see if this is something you are still able to push through.