feat: Implement keyword search in milvus
What does this PR do?
This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally.
In order to verify the implementation, run:
docker compose -f tests/unit/providers/vector_io/remote/docker-compose.milvus.yml up -d
Followed by:
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
On running the above, the results are:
========================================= test session starts =========================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0, asyncio-0.26.0
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 4 items
tests/unit/providers/vector_io/remote/test_milvus.py::test_add_chunks PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_keyword_search PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_keyword_search_k_greater_than_results PASSED
========================================== 4 passed in 8.62s ==========================================
cc: @franciscojavierarceo @Bobbins228 @Bobbins228
/hold - the unit tests fail in CI because we don't have remote Milvus servers spun up. Is there any other workaround figured out for other providers?
One option is to have a script that spins these containers up using docker. Just want to make sure if we are good with it - given that'll be an additional overhead.
Generally the unit tests should not rely on bringing up external servers, and the unit tests added here are actually integration tests. See more examples of integration tests under the tests/integration directory.
Generally the unit tests should not rely on bringing up external servers, and the unit tests added here are actually integration tests.
+1
I'd suggest using mocks for the unit test rather than the container.
We can rename these integration tests and use them.
cc: @bbrowning @franciscojavierarceo @Bobbins228
EDIT: Fixed
Updated the PR to use mock tests instead of spinning up a milvus container. The tests seem to pass locally, but they fail in CI because of pymilvus. Will take a look into the CI issue further.
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================= test session starts =========================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 5 items
tests/unit/providers/vector_io/remote/test_milvus.py::test_add_chunks PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_keyword_search PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_query_chunks_keyword_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/remote/test_milvus.py::test_delete_collection PASSED
========================================== 5 passed in 0.47s ==========================================
@franciscojavierarceo Done! Could you take a look at this again whenever you get a chance. Thanks!
@franciscojavierarceo I've modified the implementation to use BM25 search, and fall back to simple text match in case there is an error. The test to verify it is also added, could you take a look at this please. Thanks!
Tested this PR, and with remote and inline milvus. It does catch inline milvus with keyword search not implemented error as expected:
22:25:27.942 [END] /v1/vector-io/query [StatusCode.OK] (1.74ms)
22:25:27.939 [ERROR] Error executing endpoint route='/v1/vector-io/query' method='post': Keyword search is not supported in Milvus-Lite. Please use a remote Milvus server for keyword search functionality.
ERROR 2025-07-12 15:25:28,428 __main__:248 server: Error executing endpoint route='/v1/vector-io/query' method='post': Keyword search is not
supported in Milvus-Lite. Please use a remote Milvus server for keyword search functionality.
And provides expected results for remote milvus. Script added to the PR comment.