llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Ollama 4.0 vision and llama-stack token Invalid token for decoding

Open JoseGuilherme1904 opened this issue 3 months ago • 5 comments

🚀 The feature, motivation and pitch

ollama vision is new: https://ollama.com/x/llama3.2-vision

providers: inference:

  • provider_id: remote::ollama provider_type: remote::ollama config: host: 127.0.0.1 port: 11434

in lama_stack/providers/adapters/inference/ollama/ollama.py OLLAMA_SUPPORTED_MODELS = { "Llama3.1-8B-Instruct": "x/llama:latest", "Llama3.1-70B-Instruct": "llama3.1:70b-instruct-fp16", "Llama3.2-1B-Instruct": "llama3.2:1b-instruct-fp16", "Llama3.2-3B-Instruct": "llama3.2:3b-instruct-fp16", "Llama-Guard-3-8B": "llama-guard3:8b", "Llama-Guard-3-1B": "llama-guard3:1b", "Llama3.2-11B-Vision-Instruct": "x/llama:latest" }

Traceback (most recent call last): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator async for item in await event_gen: File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agents.py", line 138, in _create_agent_turn_streaming async for event in agent.create_and_execute_turn(request): File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 179, in create_and_execute_turn async for chunk in self.run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 252, in run async for res in self._run( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/agents/agent_instance.py", line 427, in _run async for chunk in await self.inference_api.chat_completion( File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 101, in return (chunk async for chunk in await provider.chat_completion(**params)) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 215, in _stream_chat_completion params = self._get_params(request) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/adapters/inference/ollama/ollama.py", line 190, in _get_params "prompt": chat_completion_request_to_prompt(request, self.formatter), File "/home/guilherme/.local/lib/python3.10/site-packages/llama_stack/providers/utils/inference/prompt_adapter.py", line 46, in chat_completion_request_to_prompt return formatter.tokenizer.decode(model_input.tokens) File "/home/guilherme/.local/lib/python3.10/site-packages/llama_models/llama3/api/tokenizer.py", line 190, in decode return self.model.decode(cast(List[int], t)) File "/home/guilherme/.local/lib/python3.10/site-packages/tiktoken/core.py", line 254, in decode return self._core_bpe.decode_bytes(tokens).decode("utf-8", errors=errors) KeyError: 'Invalid token for decoding: 128256'

Alternatives

No response

Additional context

No response

JoseGuilherme1904 avatar Nov 04 '24 22:11 JoseGuilherme1904