Daft icon indicating copy to clipboard operation
Daft copied to clipboard

feat: Implement KVConfig multi-backend architecture

Open Jay-ju opened this issue 3 months ago • 3 comments

  • Add LMDBConfig class with comprehensive configuration options
  • Update kv_get, kv_batch_get, kv_exists functions to accept KVConfig parameter
  • Add backend validation and error handling for multiple backends
  • Update test cases to use new KVConfig architecture
  • Maintain backward compatibility with legacy URI parameters
  • Provide factory methods for easy configuration creation

This implements the multi-KV backend architecture discussed in PR feedback, enabling support for different storage backends (Lance for AI/ML workloads, LMDB for high-performance caching) through a unified configuration interface.

Changes Made

Related Issues

Checklist

  • [ ] Documented in API Docs (if applicable)
  • [ ] Documented in User Guide (if applicable)
  • [ ] If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Jay-ju avatar Sep 09 '25 13:09 Jay-ju

Codecov Report

:x: Patch coverage is 34.90832% with 923 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 71.80%. Comparing base (b9ab62b) to head (efedba4). :warning: Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-dsl/src/functions/kv/memory.rs 0.00% 223 Missing :warning:
daft/functions/kv.py 15.67% 199 Missing :warning:
src/daft-session/src/kv.rs 31.88% 94 Missing :warning:
src/daft-dsl/src/functions/kv/kv_functions.rs 58.68% 88 Missing :warning:
daft/kv/lmdb.py 0.00% 61 Missing :warning:
daft/kv/lance.py 0.00% 54 Missing :warning:
src/daft-dsl/src/functions/kv/mod.rs 0.00% 53 Missing :warning:
src/daft-session/src/python.rs 65.94% 47 Missing :warning:
daft/kv/__init__.py 62.06% 33 Missing :warning:
src/daft-session/src/session.rs 57.14% 27 Missing :warning:
... and 4 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5170      +/-   ##
==========================================
+ Coverage   71.57%   71.80%   +0.22%     
==========================================
  Files         960      971      +11     
  Lines      124203   125179     +976     
==========================================
+ Hits        88897    89881     +984     
+ Misses      35306    35298       -8     
Files with missing lines Coverage Δ
daft/__init__.py 86.95% <100.00%> (+0.28%) :arrow_up:
daft/functions/__init__.py 100.00% <100.00%> (ø)
src/daft-dsl/src/functions/kv/registry.rs 100.00% <100.00%> (ø)
src/daft-dsl/src/functions/mod.rs 89.01% <ø> (ø)
src/daft-dsl/src/lib.rs 100.00% <100.00%> (ø)
...ft-local-execution/src/intermediate_ops/project.rs 97.36% <ø> (+58.25%) :arrow_up:
src/daft-parquet/src/stream_reader.rs 88.99% <100.00%> (+0.12%) :arrow_up:
src/daft-session/src/options.rs 66.66% <ø> (ø)
src/lib.rs 97.34% <100.00%> (+0.02%) :arrow_up:
src/daft-writers/src/parquet_writer.rs 91.50% <90.00%> (-0.16%) :arrow_down:
... and 13 more

... and 72 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Oct 14 '25 14:10 codecov[bot]

@rchowell I fixed your comment. Please take a look when you have time.

Jay-ju avatar Oct 17 '25 11:10 Jay-ju

@Jay-ju, It looks like @rchowell approved 🎉. Unfortunately it appears there are now some merge conflicts. If you could get those resolved, we could get this merged and included into the next release.

universalmind303 avatar Oct 29 '25 21:10 universalmind303

@Jay-ju - Thanks for the work on this. Checking to see if you are able to push this through.

madvart avatar Nov 25 '25 21:11 madvart

@universalmind303 @madvart

  • Thank you very much for your patience. I have fixed the conflicts.
  • However, some CI processes are currently failing. After checking, I think it's not an issue with my PR. I see that all of them are reporting errors like this, and I feel that it should be a PR that affects the stability of the ci which has been merged back into the main branch.
  • Please review the content of the PR again.

Jay-ju avatar Dec 06 '25 06:12 Jay-ju

@Jay-ju I'm seeing quite a few errors in the unit tests that seem to be from some unwrapping of pyresult's

 E           daft.exceptions.DaftCoreException: DaftError::External task 690926 panicked with message "called `Result::unwrap()` on an `Err` value: PyErr { type: <class 'daft.exceptions.DaftCoreException'>, value: DaftCoreException('DaftError::CatalogError KV Store with name embeddings_1765267090922382 not found!'), traceback: None }"

universalmind303 avatar Dec 10 '25 00:12 universalmind303

@universalmind303 Please help review the changes here when you have time?

Jay-ju avatar Dec 16 '25 09:12 Jay-ju