llama-stack
llama-stack copied to clipboard
Outdated metadata (embedding_dimension) returned from client.models.list()
System Info
python -m "torch.utils.collect_env" /Users/bmurdock/.pyenv/versions/3.10.16/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.7.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 15.5 (arm64) GCC version: Could not collect Clang version: 17.0.0 (clang-1700.0.13.3) CMake version: version 3.31.5 Libc version: N/A
Python version: 3.10.16 (main, May 13 2025, 14:04:10) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime) Python platform: macOS-15.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M3 Max
Versions of relevant libraries: [pip3] numpy==2.2.5 [pip3] onnxruntime==1.22.0 [pip3] torch==2.7.0 [pip3] torchao==0.11.0 [conda] Could not collect
Also, I start the server by calling:
python -m llama_stack.distribution.server.server --yaml-config /Users/bmurdock/beir/beir-venv-310/lib/python3.10/site-packages/llama_stack/templates/ollama/run.yaml --port 8321
Information
- [ ] The official example scripts
- [x] My own modified scripts
🐛 Describe the bug
In my run.yaml I have a listing for an embedding model (under models):
- metadata:
embedding_dimension: 768
model_id: granite-embedding-125m
provider_id: sentence-transformers
provider_model_id: ibm-granite/granite-embedding-125m-english
model_type: embedding
When I start the server, the entry on the console looks fine:
- metadata:
embedding_dimension: 768
model_id: granite-embedding-125m
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- embedding
provider_id: sentence-transformers
provider_model_id: ibm-granite/granite-embedding-125m-english
But when I call client.models.list() I get the following entry in the list:
Model(identifier='granite-embedding-125m', metadata={'embedding_dimension': 384.0}, api_model_type='embedding', provider_id='sentence-transformers', provider_resource_id='ibm-granite/granite-embedding-125m-english', type='model', model_type='embedding'),
Notice that the embedding_dimension is 768 in both run.yaml and the server console but is 384.0 in the client.models.list() output.
I asked on Discord and the consensus there seemed to be that it is probably a bug and that I should open an issue, so I am doing so. However, it was also noted that this might be triggered by me having earlier registered the model as having 384 dimensions; I don't remember doing that, but it seems possible. As instructed, I ran sqlite3 ~/.llama/distributions/ollama/registry.db .dump | grep "granite-embedding-125m" and saw 384.0 for dimensions in that output, so it seems like I probably did have this set to 384 at one time.
I was then advised to delete the distribution directory and start over. I ran rm -fr ~/.llama/distributions/ollama/ and then restarted the Llama Stack server. That did work around the issue and client.models.list() now correctly reports:
Model(identifier='granite-embedding-125m', metadata={'embedding_dimension': 768.0}, api_model_type='embedding', provider_id='sentence-transformers', provider_resource_id='ibm-granite/granite-embedding-125m-english', type='model', model_type='embedding'),
So there is a work-around, but it does still seem like a bug: the old value from a previous run of the server is used instead of the value that is in the run.yaml now.
Error logs
No error, just invalid (outdated) values
Expected behavior
Notice that the embedding_dimension is 768 in client.models.list() output.