llama-stack LlamaStackDirectClient with ollama failed to run

System Info

python -m torch.utils.collect_env /opt/homebrew/anaconda3/envs/llamastack-ollama/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.5.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 15.1.1 (arm64) GCC version: Could not collect Clang version: 16.0.0 (clang-1600.0.26.4) CMake version: Could not collect Libc version: N/A

Python version: 3.10.15 (main, Oct 3 2024, 02:24:49) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-15.1.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] torch==2.5.1 [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.5.1 pypi_0 pypi

Information

[ ] The official example scripts
[X] My own modified scripts

🐛 Describe the bug

response r is not a dict but <class 'ollama._types.GenerateResponse'> , Maybe we should remove this line ---My script-----

import asyncio
import os

#pip install aiosqlite ollama faiss
from llama_stack_client.lib.direct.direct import LlamaStackDirectClient
from llama_stack_client.types import SystemMessage, UserMessage


async def main():
    os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct"
    client = await LlamaStackDirectClient.from_template("ollama")
    await client.initialize()
    response = await client.models.list()
    print(response)
    model_name = response[0].identifier
    response = await client.inference.chat_completion(
        messages=[
            SystemMessage(content="You are a friendly assistant.", role="system"),
            UserMessage(
                content="hello world, write me a 2 sentence poem about the moon",
                role="user",
            ),
        ],
        model_id=model_name,
        stream=False,
    )
    print("\nChat completion response:")
    print(response, type(response))


asyncio.run(main())

Error logs

Error:

❯ python test.py
Using template ollama with config:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: ollama
datasets: []
docker_image: null
eval_tasks: []
image_name: ollama
memory_banks: []
metadata_store:
  db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-1B-Instruct
  provider_id: ollama
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/kaiwu/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  memory:
  - config:
      kvstore:
        db_path: /Users/kaiwu/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
Traceback (most recent call last):
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 32, in <module>
    asyncio.run(main())
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 17, in main
    response = await client.inference.chat_completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 147, in post
    async for response in self._call_endpoint(path, "POST", body):
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 118, in _call_endpoint
    yield await func(**(body or {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/distribution/routers/routers.py", line 123, in chat_completion
    return await provider.chat_completion(**params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 221, in chat_completion
    return await self._nonstream_chat_completion(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 273, in _nonstream_chat_completion
    assert isinstance(r, dict)
AssertionError

---debug print----

...
version: '2'

[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
r model='llama3.2:1b-instruct-fp16' created_at='2024-12-03T18:00:33.186056Z' done=True done_reason='stop' total_duration=1213954708 load_duration=29042500 prompt_eval_count=35 prompt_eval_duration=698000000 eval_count=30 eval_duration=486000000 response='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.' context=None 
type <class 'ollama._types.GenerateResponse'>

Chat completion response:
completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None

Expected behavior

importing_as_library should be able to run with Ollama

Dec 03 '24 18:12 wukaixingxp

@wukaixingxp does removing that line help?

Dec 04 '24 00:12 ashwinb

Yes, but I was wondering why that line works with normal LlamaStackClient but not LlamaStackDirectClient.

Dec 04 '24 00:12 wukaixingxp

also in this line model= should be model_id= .

Dec 04 '24 00:12 wukaixingxp

PR sent to fix this issue. @ashwinb Please take a look!

Dec 04 '24 00:12 wukaixingxp

Will this fix go into the python package anytime soon? I'm working on a llama-recipe using llama-stack and it would be easier to follow by just using pip install. I experience the issue with both Direct client and regular client.

By the way, the issue also corrupts the quickstart example: https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

I'm guessing a new docker image must be created and pushed?

@ashwinb @wukaixingxp any plans in the near future to support ':latest' tags from Ollama? Langchain does

Dec 06 '24 03:12 miguelg719

Does this PR close the issue? https://github.com/meta-llama/llama-stack/pull/563

Dec 10 '24 18:12 yanxi0830

New python package has this integrated, thanks

Dec 11 '24 20:12 miguelg719

llama-stack llama-stack copied to clipboard

LlamaStackDirectClient with ollama failed to run

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

llama-stack
llama-stack copied to clipboard