fix: Account for unschedulable udf actors
Changes Made
Raises error in situations where actors for actor UDFs (i.e. concurrency udfs) cannot be scheduled due to resource constraints.
This is implemented via timeout (default 60s) on the readiness check for the actors. Additionally, each swordfish task that needs to run actor UDFs will receive the full list of actor handles, and will check for which actors are ready + local. This is to account for the fact that the UDF actors can be spawned incrementally, and when killed, can spawn on different nodes.
Related Issues
Closes https://github.com/Eventual-Inc/Daft/issues/3934
Checklist
- [ ] Documented in API Docs (if applicable)
- [ ] Documented in User Guide (if applicable)
- [ ] If adding a new documentation page, doc is added to
docs/mkdocs.ymlnavigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)
Codecov Report
:x: Patch coverage is 12.03704% with 95 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 73.77%. Comparing base (33884be) to head (ec0412d).
:warning: Report is 4 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #4987 +/- ##
==========================================
- Coverage 73.81% 73.77% -0.05%
==========================================
Files 957 957
Lines 124278 123974 -304
==========================================
- Hits 91740 91465 -275
+ Misses 32538 32509 -29
| Files with missing lines | Coverage Δ | |
|---|---|---|
| daft/context.py | 81.57% <ø> (ø) |
|
| src/common/daft-config/src/lib.rs | 72.89% <100.00%> (+0.25%) |
:arrow_up: |
| ...ft-local-execution/src/intermediate_ops/explode.rs | 87.50% <100.00%> (ø) |
|
| ...aft-local-execution/src/intermediate_ops/filter.rs | 95.16% <100.00%> (ø) |
|
| ...tion/src/intermediate_ops/inner_hash_join_probe.rs | 98.65% <100.00%> (ø) |
|
| ...-execution/src/intermediate_ops/intermediate_op.rs | 91.19% <ø> (ø) |
|
| ...cal-execution/src/intermediate_ops/into_batches.rs | 95.55% <100.00%> (ø) |
|
| ...ft-local-execution/src/intermediate_ops/project.rs | 100.00% <100.00%> (ø) |
|
| ...aft-local-execution/src/intermediate_ops/sample.rs | 6.35% <100.00%> (-59.28%) |
:arrow_down: |
| ...c/daft-local-execution/src/intermediate_ops/udf.rs | 93.72% <100.00%> (+1.16%) |
:arrow_up: |
| ... and 6 more |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Overall this makes sense, but I'm not sure the tests indicate the behavior we want.
What i was going for was:
- if some of the udf actors are schedulable (within a timeout), make progress
- if none of the udf actors are schedulable (within a timeout), error
lemme refactor a tests a bit to reflect this better