UFO icon indicating copy to clipboard operation
UFO copied to clipboard

Can not open edge browser using ollama/llama3?

Open kevinkwang3921 opened this issue 7 months ago • 3 comments

UFO launched successfully, AI model knows the user intent, but failed to open the edge browser. After 30 steps, still failed. but it shows Evaluation results: the agent successfully opened the edge browser.

Image

Detailed Steps:

"Please enter your request to be completed@: open the edge Round 1, Step 1, HostAgent: Analyzing the user intent and decomposing the request ... Observations@: I observe that Edge browser is not visible in the screenshot, nor available in the list of applications. So I need to open the Edge application directly. Thoughts : The user request can be solely complete on the Edge browser. However, the Edge browser is not visible in the screenshot, nor available in the list of applications. I need to first open the Edge browser to open it. Running Bash Command: start edge Plans Q: (1) Open the Edge browser. Next Selected application: None"

Image

kevinkwang3921 avatar May 23 '25 07:05 kevinkwang3921

Could you provide the response.log and request.log files from the logs/[task_name] dir in a pastebin? This can greatly help us pinpoint out the issue

nice-mee avatar Jun 12 '25 01:06 nice-mee

Update the log response.log

kevinkwang3921 avatar Jun 13 '25 02:06 kevinkwang3921

Update the log response.log

In the provided log, I saw "Bash": "start edge",

The LLM model is trying to launch Edge using start edge, but the correct behavior is start msedge. This is most likely due to limited capability of the LLM model used. I tested with gpt-4o and gpt-4.1 and they all seem to be fine.

If you want to fix the issue for this model, I recommend adding a line in the host agent prompt forcing the model to use start msedge. The relevant prompt can be found in ufo/prompts/share/base/host_agent.yaml.

nice-mee avatar Jun 13 '25 02:06 nice-mee