OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Bug]: The agent (Devstral) frequently forgets that it's an agent and starts acting like a typical LLM

Open matty-fortune opened this issue 6 months ago β€’ 7 comments

Is there an existing issue for the same bug? (If one exists, thumbs up or comment on the issue instead).

  • [x] I have checked the existing issues.

Describe the bug and reproduction steps

Here is an example of what should be a very simple prompt for Devstral that will usually fail to create any files, instead telling you to create the files yourself:

I created a basic repo for you to use by running the command npx create-react-app our-components --template typescript

I want to create a variety of components and I want to have the main App page be a showcase for the various components, but let's not get ahead of ourselves. I want to simply start with you making a nice button component, with all the usual attributes and features the button component would have. Use the subdirectory components and get started.

When it fails to act as an agent, even prompting it with something like I need you to create the files yourself. is not able to put it back on track.

OpenHands Installation

Docker command in README

OpenHands Version

docker.all-hands.dev/all-hands-ai/runtime:0.41-nikolaik

Operating System

MacOS

matty-fortune avatar Jun 06 '25 17:06 matty-fortune

Hi. Are you using Devstral (specifically mistralai/devstral-small-2505) with LM Studio? Ollama and other LLM servers may not load its internal Jinja template, which is required for Devstral to work properly with OpenHands. Also, make sure the context size is set to at least the recommended 40,960 tokens.

llamantino avatar Jun 07 '25 01:06 llamantino

It seems LM Studio is not open source, so that doesn't work for me. Is there a recommended way that's fully open source?

matty-fortune avatar Jun 09 '25 00:06 matty-fortune

You can use llama.cpp with the --jinja flag, though it requires setting the correct parameters and using proprietary libraries for GPU support. Before that, could you confirm which Devstral model you're using - the one officially managed by Ollama, or a third-party GGUF?

llamantino avatar Jun 11 '25 07:06 llamantino

Yes, the official Ollama Devstral. It has a template in its model file, along with the Devstral system prompt, which can be seen using ollama show <model-name> --modelfile. If the user prompt asks for information from the system prompt it is accessible to the model. So it seems like the template is working. I've included the template below, which I guess is in format called Go templates.

TEMPLATE """{{- range $index, $_ := .Messages }}
{{- if eq .Role "system" }}[SYSTEM_PROMPT]{{ .Content }}[/SYSTEM_PROMPT]
{{- else if eq .Role "user" }}
{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS]{{ $.Tools }}[/AVAILABLE_TOOLS]
{{- end }}[INST]{{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{- if .Content }}{{ .Content }}
{{- if not (eq (len (slice $.Messages $index)) 1) }}</s>
{{- end }}
{{- else if .ToolCalls }}[TOOL_CALLS][
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}]</s>
{{- end }}
{{- else if eq .Role "tool" }}[TOOL_RESULTS]{"content": {{ .Content }}}[/TOOL_RESULTS]
{{- end }}
{{- end }}"""

I suppose this is what the Jinja template should look like? What might the Jinja template be doing that the Go template isn't doing? If anything, can it be done in the Go template format?

{{- bos_token }}
{%- if messages[0]['role'] == 'system' %}
    {%- if messages[0]['content'] is string %}
        {%- set system_message = messages[0]['content'] %}
    {%- else %}
        {%- set system_message = messages[0]['content'][0]['text'] %}
    {%- endif %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set system_message = "OMITTED_DUE_TO_LENGTH" %}
    {%- set loop_messages = messages %}
{%- endif %}
{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}
{%- for message in loop_messages %}
    {%- if message['role'] == 'user' %}
        {%- if message['content'] is string %}
            {{- '[INST]' + message['content'] + '[/INST]' }}
        {%- else %}
            {{- '[INST]' }}
            {%- for block in message['content'] %}
                {%- if block['type'] == 'text' %}
                    {{- block['text'] }}
                {%- elif block['type'] in ['image', 'image_url'] %}
                    {{- '[IMG]' }}
                {%- else %}
                    {{- raise_exception('Only text and image blocks are supported in message content!') }}
                {%- endif %}
            {%- endfor %}
            {{- '[/INST]' }}
        {%- endif %}
    {%- elif message['role'] == 'system' %}
        {%- if message['content'] is string %}
            {{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}
        {%- else %}
            {{- '[SYSTEM_PROMPT]' + message['content'][0]['text'] + '[/SYSTEM_PROMPT]' }}
        {%- endif %}
    {%- elif message['role'] == 'assistant' %}
        {%- if message['content'] is string %}
            {{- message['content'] + eos_token }}
        {%- else %}
            {{- message['content'][0]['text'] + eos_token }}
        {%- endif %}
    {%- else %}
        {{- raise_exception('Only user, system and assistant roles are supported!') }}
    {%- endif %}
{%- endfor %}

matty-fortune avatar Jun 11 '25 15:06 matty-fortune

This is apparently, the internal Devstral jinja template (as shown by LM studio):

{%- set today = strftime_now("%Y-%m-%d") %}
{%- set default_system_message = "You are Devstral, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.\nYour knowledge base was last updated on 2023-10-01. The current date is " + today + ".\n\nWhen you're not sure about some information, you say that you don't have the information and don't make up anything.\nIf the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")" %}

{{- bos_token }}

{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content'] %}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set system_message = default_system_message %}
    {%- set loop_messages = messages %}
{%- endif %}
{{- '[SYSTEM_PROMPT]' + system_message + '[/SYSTEM_PROMPT]' }}

{%- for message in loop_messages %}
    {%- if message['role'] == 'user' %}
        {{- '[INST]' + message['content'] + '[/INST]' }}
    {%- elif message['role'] == 'system' %}
        {{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}
    {%- elif message['role'] == 'assistant' %}
        {{- message['content'] + eos_token }}
    {%- else %}
        {{- raise_exception('Only user, system and assistant roles are supported!') }}
    {%- endif %}
{%- endfor %}

The Ollama template seems different, but I can't really tell if that's what's confusing the model. Maybe it's due to a different tool-calling format? The Jinja template doesn't define any. I tried the prompt you wrote in the first message a couple of times, and while Devstral did make several mistakes (especially when editing files - it's a small local model, after all), it never seemed to exit its agent role. I attached a screenshot. I tried it on OpenRouter because unfortunately I don't have a 16+gb videocard and it takes forever to run.

Image

llamantino avatar Jun 13 '25 02:06 llamantino

@matty-fortune Sorry it took so long (my connection is a potato and Ollama loves nuking partial downloads when they fail).

We've managed to reproduce the issue and track it down: it's the context size. You didn't configure it, right? It wasn't clear to me either. Devstral in the Ollama library defaults to just 4096 tokens, which isn't even enough to fit OpenHands' initial prompts - that's likely why it kept acting weird.

Try running ollama serve with this env var set:

OLLAMA_CONTEXT_LENGTH=32768

(I suggest an even higher value, if you have enough vram, Devstral Q4 32k should fit just right in 16gb.)

Alternatively, you can create a custom model using a Modelfile with FROM Devstral and PARAMETER num_ctx 32768, then run ollama create <NewModelName> -f <Modelfile> and use the new model name in OpenHands instead of setting the env var.

Let us know if things improve.

Note: if you're using a GGUF model like Unsloth's - which I know you aren’t, but mentioning this in case someone else stumbles on the same issue - make sure to follow the setup instructions to work around the unsupported template in this post.

llamantino avatar Jun 17 '25 12:06 llamantino

@llamantino thank you for continuing to look into it. I think that's the cause of the issue. I wasn't setting it so it was using the default.

Please consider updating the docs since Ollama is advertised as compatible by Mistral. After setting it OpenHands with Devstral seems to be working well with Ollama now πŸ‘

matty-fortune avatar Jun 18 '25 17:06 matty-fortune

This happened for other models previously, context length is often a problem, @llamantino are there ways in terminal commands that force context length rules for LiteLLM? (Both OpenRouter and Ollama could get a bit of help)

BradKML avatar Jun 30 '25 06:06 BradKML

@BradKML I don't think so, and even if it did, it wouldn't be ideal. The preferred behavior is for the server to return a context exceeded error, allowing OpenHands to detect it and trigger a condensation that summarizes/reduces the context.

I saw that there is some kind of detection for OpenRouter context exceeded error in OpenHands code (agent_controller.py), but I never tested it. As far as I know, Ollama doesn't support setting a stop limit. LM Studio does support setting a stop limit, but I'm not sure whether OpenHands correctly detects the resulting error. I was planning to check, eventually.

If you want more details, I suggest you to ask the devs on the official Slack workspace (see project page).

llamantino avatar Jul 01 '25 22:07 llamantino

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Aug 01 '25 02:08 github-actions[bot]

This issue was automatically closed due to 50 days of inactivity. We do this to help keep the issues somewhat manageable and focus on active issues.

github-actions[bot] avatar Aug 12 '25 02:08 github-actions[bot]