milvus icon indicating copy to clipboard operation
milvus copied to clipboard

enhance: support manifest-based index building with Loon FFI reader

Open congqixia opened this issue 2 weeks ago • 13 comments

This PR adds support for reading data from StorageV2 using manifest files and the Loon FFI interface during index building, providing an alternative to the traditional segment insert files approach.

Key changes:

Core C++ changes:

  • Add SEGMENT_MANIFEST_KEY and LOON_FFI_PROPERTIES_KEY constants for manifest handling
  • Extend FileManagerContext to carry loon_ffi_properties for FFI operations
  • Update index_c.cpp to pass manifest and loon properties to file managers for all index types (vector, JSON key, text)
  • Implement GetFieldDatasFromManifest() in Util.cpp using Arrow C Stream interface:
    • Create Arrow schema from field metadata
    • Initialize FFI reader with manifest content and storage properties
    • Import record batches from C data interface
    • Convert to FieldData for index building
  • Update DiskFileManagerImpl and MemFileManagerImpl to support manifest-based data reading with fallback to traditional paths

Loon FFI utilities (internal/core/src/storage/loon_ffi/):

  • Add ToCStorageConfig() to convert StorageConfig to C-compatible structure
  • Implement GetManifest() to parse manifest JSON and retrieve column groups via FFI
  • Enhance MakePropertiesFromStorageConfig() integration

Storage V2 integration:

  • Update milvus-storage dependency from 0883026 to 302143c for latest FFI support

Protobuf changes:

  • Add manifest field to BuildIndexInfo for passing manifest path to C++ layer

Configuration:

  • Add common.storageV2.useLoonFFI config option (default: false) for feature toggle

This change is part of issue #44956 to integrate the StorageV2 FFI interface as the unified storage layer. The implementation maintains backward compatibility by checking for manifest presence and falling back to existing segment insert files approach when manifest is not provided.

Related issue: #44956

congqixia avatar Nov 20 '25 07:11 congqixia

[ci-v2-notice] Notice: We are gradually rolling out the new ci-v2 system.

  • Legacy CI jobs remain unaffected, you can just ignore ci-v2 if you don't want to run it.
  • Additional "ci-v2/*" checkers will run for this PR to ensure the new ci-v2 system is working as expected.
  • For tests that exist in both v1 and v2, passing in either system is considered PASS.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration
  • /ci-rerun-ut-go // for ci-v2/ut-go
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm

If you have any questions or requests, please contact @zhikunyao.

sre-ci-robot avatar Nov 20 '25 07:11 sre-ci-robot

Codecov Report

:x: Patch coverage is 11.25000% with 71 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 76.16%. Comparing base (03f5d7c) to head (1f342d8). :warning: Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/storage/Util.cpp 0.00% 43 Missing :warning:
internal/core/src/indexbuilder/index_c.cpp 13.33% 13 Missing :warning:
internal/core/src/storage/DiskFileManagerImpl.cpp 9.09% 10 Missing :warning:
internal/core/src/storage/MemFileManagerImpl.cpp 54.54% 5 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #45726       +/-   ##
===========================================
- Coverage   82.80%   76.16%    -6.65%     
===========================================
  Files         524     1881     +1357     
  Lines       81872   293902   +212030     
===========================================
+ Hits        67798   223862   +156064     
- Misses      14074    62606    +48532     
- Partials        0     7434     +7434     
Components Coverage Δ
Client 78.17% <ø> (∅)
Core 82.74% <11.25%> (-0.07%) :arrow_down:
Go 74.30% <ø> (∅)
Files with missing lines Coverage Δ
internal/core/src/storage/FileManager.h 60.00% <ø> (-1.12%) :arrow_down:
internal/core/src/storage/Util.h 100.00% <ø> (ø)
internal/core/src/storage/MemFileManagerImpl.cpp 46.41% <54.54%> (+0.15%) :arrow_up:
internal/core/src/storage/DiskFileManagerImpl.cpp 61.08% <9.09%> (-0.80%) :arrow_down:
internal/core/src/indexbuilder/index_c.cpp 54.08% <13.33%> (-1.23%) :arrow_down:
internal/core/src/storage/Util.cpp 79.59% <0.00%> (-4.31%) :arrow_down:

... and 1357 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 20 '25 09:11 codecov[bot]

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 20 '25 10:11 mergify[bot]

/run-cpu-e2e

congqixia avatar Nov 21 '25 00:11 congqixia

/ci-rerun-ut-go

congqixia avatar Nov 21 '25 00:11 congqixia

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 21 '25 03:11 mergify[bot]

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar Nov 24 '25 12:11 sre-ci-robot

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 25 '25 12:11 mergify[bot]

/run-cpu-e2e

congqixia avatar Nov 25 '25 12:11 congqixia

@congqixia cpu-e2e job failed, comment /run-cpu-e2e can trigger the job again.

mergify[bot] avatar Nov 25 '25 14:11 mergify[bot]

/run-cpu-e2e

congqixia avatar Nov 25 '25 16:11 congqixia

/ci-rerun-ut-integration

congqixia avatar Nov 25 '25 16:11 congqixia

/ci-rerun-ut-integration

congqixia avatar Nov 25 '25 23:11 congqixia

/lgtm

tedxu avatar Nov 26 '25 04:11 tedxu