milvus
milvus copied to clipboard
enhance: support manifest-based index building with Loon FFI reader
This PR adds support for reading data from StorageV2 using manifest files and the Loon FFI interface during index building, providing an alternative to the traditional segment insert files approach.
Key changes:
Core C++ changes:
- Add SEGMENT_MANIFEST_KEY and LOON_FFI_PROPERTIES_KEY constants for manifest handling
- Extend FileManagerContext to carry loon_ffi_properties for FFI operations
- Update index_c.cpp to pass manifest and loon properties to file managers for all index types (vector, JSON key, text)
- Implement GetFieldDatasFromManifest() in Util.cpp using Arrow C Stream interface:
- Create Arrow schema from field metadata
- Initialize FFI reader with manifest content and storage properties
- Import record batches from C data interface
- Convert to FieldData for index building
- Update DiskFileManagerImpl and MemFileManagerImpl to support manifest-based data reading with fallback to traditional paths
Loon FFI utilities (internal/core/src/storage/loon_ffi/):
- Add ToCStorageConfig() to convert StorageConfig to C-compatible structure
- Implement GetManifest() to parse manifest JSON and retrieve column groups via FFI
- Enhance MakePropertiesFromStorageConfig() integration
Storage V2 integration:
- Update milvus-storage dependency from 0883026 to 302143c for latest FFI support
Protobuf changes:
- Add manifest field to BuildIndexInfo for passing manifest path to C++ layer
Configuration:
- Add common.storageV2.useLoonFFI config option (default: false) for feature toggle
This change is part of issue #44956 to integrate the StorageV2 FFI interface as the unified storage layer. The implementation maintains backward compatibility by checking for manifest presence and falling back to existing segment insert files approach when manifest is not provided.
Related issue: #44956
[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.
- Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
- Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
- For tests that exist in both v1 and v2, passing in either system is considered PASS.
To rerun ci-v2 checks, comment with:
- /ci-rerun-code-check // for ci-v2/code-check
- /ci-rerun-build // for ci-v2/build
- /ci-rerun-ut-integration // for ci-v2/ut-integration
- /ci-rerun-ut-go // for ci-v2/ut-go
- /ci-rerun-ut-cpp // for ci-v2/ut-cpp
- /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
- /ci-rerun-e2e-arm // for ci-v2/e2e-arm
If you have any questions or requests, please contact @zhikunyao.
Codecov Report
:x: Patch coverage is 11.25000% with 71 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 76.16%. Comparing base (03f5d7c) to head (1f342d8).
:warning: Report is 6 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #45726 +/- ##
===========================================
- Coverage 82.80% 76.16% -6.65%
===========================================
Files 524 1881 +1357
Lines 81872 293902 +212030
===========================================
+ Hits 67798 223862 +156064
- Misses 14074 62606 +48532
- Partials 0 7434 +7434
| Components | Coverage Δ | |
|---|---|---|
| Client | 78.17% <ø> (∅) |
|
| Core | 82.74% <11.25%> (-0.07%) |
:arrow_down: |
| Go | 74.30% <ø> (∅) |
| Files with missing lines | Coverage Δ | |
|---|---|---|
| internal/core/src/storage/FileManager.h | 60.00% <ø> (-1.12%) |
:arrow_down: |
| internal/core/src/storage/Util.h | 100.00% <ø> (ø) |
|
| internal/core/src/storage/MemFileManagerImpl.cpp | 46.41% <54.54%> (+0.15%) |
:arrow_up: |
| internal/core/src/storage/DiskFileManagerImpl.cpp | 61.08% <9.09%> (-0.80%) |
:arrow_down: |
| internal/core/src/indexbuilder/index_c.cpp | 54.08% <13.33%> (-1.23%) |
:arrow_down: |
| internal/core/src/storage/Util.cpp | 79.59% <0.00%> (-4.31%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
/run-cpu-e2e
/ci-rerun-ut-go
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: congqixia
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~internal/core/OWNERS~~ [congqixia]
- ~~pkg/proto/OWNERS~~ [congqixia]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
/run-cpu-e2e
@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.
/run-cpu-e2e
/ci-rerun-ut-integration
/ci-rerun-ut-integration
/lgtm