llama-stack
llama-stack copied to clipboard
LlamaStackDirectClient with ollama failed to run
System Info
python -m torch.utils.collect_env /opt/homebrew/anaconda3/envs/llamastack-ollama/lib/python3.10/runpy.py:126: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) Collecting environment information... PyTorch version: 2.5.1 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: macOS 15.1.1 (arm64) GCC version: Could not collect Clang version: 16.0.0 (clang-1600.0.26.4) CMake version: Could not collect Libc version: N/A
Python version: 3.10.15 (main, Oct 3 2024, 02:24:49) [Clang 14.0.6 ] (64-bit runtime) Python platform: macOS-15.1.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Apple M1 Pro
Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] torch==2.5.1 [conda] numpy 1.26.4 pypi_0 pypi [conda] torch 2.5.1 pypi_0 pypi
Information
- [ ] The official example scripts
- [X] My own modified scripts
🐛 Describe the bug
response r is not a dict but <class 'ollama._types.GenerateResponse'> , Maybe we should remove this line ---My script-----
import asyncio
import os
#pip install aiosqlite ollama faiss
from llama_stack_client.lib.direct.direct import LlamaStackDirectClient
from llama_stack_client.types import SystemMessage, UserMessage
async def main():
os.environ["INFERENCE_MODEL"] = "meta-llama/Llama-3.2-1B-Instruct"
client = await LlamaStackDirectClient.from_template("ollama")
await client.initialize()
response = await client.models.list()
print(response)
model_name = response[0].identifier
response = await client.inference.chat_completion(
messages=[
SystemMessage(content="You are a friendly assistant.", role="system"),
UserMessage(
content="hello world, write me a 2 sentence poem about the moon",
role="user",
),
],
model_id=model_name,
stream=False,
)
print("\nChat completion response:")
print(response, type(response))
asyncio.run(main())
Error logs
Error:
❯ python test.py
Using template ollama with config:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: ollama
datasets: []
docker_image: null
eval_tasks: []
image_name: ollama
memory_banks: []
metadata_store:
db_path: /Users/kaiwu/.llama/distributions/ollama/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-1B-Instruct
provider_id: ollama
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /Users/kaiwu/.llama/distributions/ollama/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
url: http://localhost:11434
provider_id: ollama
provider_type: remote::ollama
memory:
- config:
kvstore:
db_path: /Users/kaiwu/.llama/distributions/ollama/faiss_store.db
namespace: null
type: sqlite
provider_id: faiss
provider_type: inline::faiss
safety:
- config: {}
provider_id: llama-guard
provider_type: inline::llama-guard
telemetry:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'
[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
Traceback (most recent call last):
File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 32, in <module>
asyncio.run(main())
File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/kaiwu/work/llama-stack-apps/examples/DocQA/test.py", line 17, in main
response = await client.inference.chat_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 147, in post
async for response in self._call_endpoint(path, "POST", body):
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack_client/lib/direct/direct.py", line 118, in _call_endpoint
yield await func(**(body or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/distribution/routers/routers.py", line 123, in chat_completion
return await provider.chat_completion(**params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 221, in chat_completion
return await self._nonstream_chat_completion(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/llama_stack/providers/remote/inference/ollama/ollama.py", line 273, in _nonstream_chat_completion
assert isinstance(r, dict)
AssertionError
---debug print----
...
version: '2'
[Model(identifier='meta-llama/Llama-3.2-1B-Instruct', provider_resource_id='llama3.2:1b-instruct-fp16', provider_id='ollama', type='model', metadata={})]
r model='llama3.2:1b-instruct-fp16' created_at='2024-12-03T18:00:33.186056Z' done=True done_reason='stop' total_duration=1213954708 load_duration=29042500 prompt_eval_count=35 prompt_eval_duration=698000000 eval_count=30 eval_duration=486000000 response='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.' context=None
type <class 'ollama._types.GenerateResponse'>
Chat completion response:
completion_message=CompletionMessage(role='assistant', content='Here is a short poem about the moon:\n\nThe moon glows softly in the midnight sky,\nA silver crescent shining, catching the eye.', stop_reason=<StopReason.end_of_turn: 'end_of_turn'>, tool_calls=[]) logprobs=None
Expected behavior
importing_as_library should be able to run with Ollama
@wukaixingxp does removing that line help?
Yes, but I was wondering why that line works with normal LlamaStackClient but not LlamaStackDirectClient.
also in this line model= should be model_id= .
PR sent to fix this issue. @ashwinb Please take a look!
Will this fix go into the python package anytime soon? I'm working on a llama-recipe using llama-stack and it would be easier to follow by just using pip install. I experience the issue with both Direct client and regular client.
By the way, the issue also corrupts the quickstart example: https://llama-stack.readthedocs.io/en/latest/getting_started/index.html
I'm guessing a new docker image must be created and pushed?
@ashwinb @wukaixingxp any plans in the near future to support ':latest' tags from Ollama? Langchain does
Does this PR close the issue? https://github.com/meta-llama/llama-stack/pull/563
New python package has this integrated, thanks