llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Outdated metadata (embedding_dimension) returned from client.models.list()

Open jwm4 opened this issue 5 months ago • 0 comments

System Info

python -m "torch.utils.collect_env" /Users/bmurdock/.pyenv/versions/3.10.16/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.7.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 15.5 (arm64) GCC version: Could not collect Clang version: 17.0.0 (clang-1700.0.13.3) CMake version: version 3.31.5 Libc version: N/A

Python version: 3.10.16 (main, May 13 2025, 14:04:10) [Clang 17.0.0 (clang-1700.0.13.3)] (64-bit runtime) Python platform: macOS-15.5-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M3 Max

Versions of relevant libraries: [pip3] numpy==2.2.5 [pip3] onnxruntime==1.22.0 [pip3] torch==2.7.0 [pip3] torchao==0.11.0 [conda] Could not collect

Also, I start the server by calling:

python -m llama_stack.distribution.server.server --yaml-config /Users/bmurdock/beir/beir-venv-310/lib/python3.10/site-packages/llama_stack/templates/ollama/run.yaml --port 8321

Information

  • [ ] The official example scripts
  • [x] My own modified scripts

🐛 Describe the bug

In my run.yaml I have a listing for an embedding model (under models):

- metadata:
    embedding_dimension: 768
  model_id: granite-embedding-125m
  provider_id: sentence-transformers
  provider_model_id: ibm-granite/granite-embedding-125m-english
  model_type: embedding

When I start the server, the entry on the console looks fine:

         - metadata:
             embedding_dimension: 768
           model_id: granite-embedding-125m
           model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
           - embedding
           provider_id: sentence-transformers
           provider_model_id: ibm-granite/granite-embedding-125m-english

But when I call client.models.list() I get the following entry in the list:

Model(identifier='granite-embedding-125m', metadata={'embedding_dimension': 384.0}, api_model_type='embedding', provider_id='sentence-transformers', provider_resource_id='ibm-granite/granite-embedding-125m-english', type='model', model_type='embedding'),

Notice that the embedding_dimension is 768 in both run.yaml and the server console but is 384.0 in the client.models.list() output.

I asked on Discord and the consensus there seemed to be that it is probably a bug and that I should open an issue, so I am doing so. However, it was also noted that this might be triggered by me having earlier registered the model as having 384 dimensions; I don't remember doing that, but it seems possible. As instructed, I ran sqlite3 ~/.llama/distributions/ollama/registry.db .dump | grep "granite-embedding-125m" and saw 384.0 for dimensions in that output, so it seems like I probably did have this set to 384 at one time.

I was then advised to delete the distribution directory and start over. I ran rm -fr ~/.llama/distributions/ollama/ and then restarted the Llama Stack server. That did work around the issue and client.models.list() now correctly reports:

 Model(identifier='granite-embedding-125m', metadata={'embedding_dimension': 768.0}, api_model_type='embedding', provider_id='sentence-transformers', provider_resource_id='ibm-granite/granite-embedding-125m-english', type='model', model_type='embedding'),

So there is a work-around, but it does still seem like a bug: the old value from a previous run of the server is used instead of the value that is in the run.yaml now.

Error logs

No error, just invalid (outdated) values

Expected behavior

Notice that the embedding_dimension is 768 in client.models.list() output.

jwm4 avatar May 29 '25 22:05 jwm4