Multiple inference providers unconditionally adding BOS token to completion prompts
System Info
Latest llama-stack from main as of this bug report (029e4fc64d9017eed625c927a69e71fff9033727)
🐛 Describe the bug
When calling the completions API, multiple inference providers are adding Llama-specific BOS tokens into my prompt.
Here's a simple reproducing test:
inlined, for context:
import pytest
from llama_stack.apis.inference import CompletionRequest
from llama_stack.providers.utils.inference.prompt_adapter import completion_request_to_prompt
@pytest.mark.asyncio
async def test_foo():
expected_prompt = "foo"
request = CompletionRequest(
model = "a model",
content = expected_prompt,
)
prompt = await completion_request_to_prompt(request)
assert prompt == expected_prompt
Error logs
==================================================================================================================== test session starts =====================================================================================================================
platform linux -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /home/bbrownin/src/llama-stack/venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'Linux-6.13.6-100.fc40.x86_64-x86_64-with-glibc2.39', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'anyio': '4.8.0', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'html': '4.1.1', 'langsmith': '0.3.15'}}
rootdir: /home/bbrownin/src/llama-stack
configfile: pyproject.toml
plugins: anyio-4.8.0, metadata-3.1.1, asyncio-0.25.3, html-4.1.1, langsmith-0.3.15
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 1 item
tests/unit/providers/inference/test_prompt_adapter.py::test_foo FAILED
========================================================================================================================== FAILURES ==========================================================================================================================
__________________________________________________________________________________________________________________________ test_foo __________________________________________________________________________________________________________________________
@pytest.mark.asyncio
async def test_foo():
expected_prompt = "foo"
request = CompletionRequest(
model = "a model",
content = expected_prompt,
)
prompt = await completion_request_to_prompt(request)
> assert prompt == expected_prompt
E AssertionError: assert '<|begin_of_text|>foo' == 'foo'
E
E - foo
E + <|begin_of_text|>foo
tests/unit/providers/inference/test_prompt_adapter.py:21: AssertionError
Expected behavior
The helper utility completion_request_to_prompt should not be unconditionally adding the Llama BOS token to the start of completion prompts. Or, if it continues to do that, each inference provider that uses it for completion prompts needs to be adjusted to only use it when the model in question is a Llama model.
This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.
I believe this is still relevant, although I also haven't seen any other recent user reports of it. So, commenting here to keep this from being automatically closed quite yet until I dig into this more.
This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!