chuheww comments

Results 14 comments of


                                            chuheww

OSWorld cannot parse the response

> > 运行OSWorld的时候遇到如下bug，无论是vllm部署还是基于transformers部署都会出现如下错误，部署端日志显示应该是传的message格式有问题 > > > > [server.log](https://github.com/user-attachments/files/19961586/server.log) > > 是评测代码里message拼接有问题，已成功运行请问下 osworld 中 model_type 都是qwen25vl 您是vllm 部署的1.5 7B模型吗如果部署2B模型解析出来的点击位置不对的话是不是也会导致Exception in chrome/7b6c7e24-c58a-49fc-a5bb-d57b80e5b4c3: local variable 'response' referenced before assignment 这个问题呀谢谢

OSWorld cannot parse the response

> osworld上的实现目前发现有两处bug，已经提pr修复：[xlang-ai/OSWorld#194](https://github.com/xlang-ai/OSWorld/pull/194) > > 另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查您好我刚刚拉了您的run_uitars 和uitars_agent脚本本地vllm 部署的是2B-SFT模型，run_uitars中直接设置的observation的type为screenshot_a11_tree 然后报了local variable 'response' referenced before assignment 这个问题无法继续进行后续任务了您可以帮我看下这个问题吗 ![Image](https://github.com/user-attachments/assets/dae6cabb-5170-481c-916e-5487cc7d4ff6)

OSWorld cannot parse the response

> osworld上的实现目前发现有两处bug，已经提pr修复：[xlang-ai/OSWorld#194](https://github.com/xlang-ai/OSWorld/pull/194) > > 另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查您好，我的第一步可以正常生成结果且正确但是第二步就无法生成respone 直接跳到local variable 'response' referenced before assignment这个报错呢 ![Image](https://github.com/user-attachments/assets/7da694ab-3499-4e2a-971a-eeeffa93d98f) ![Image](https://github.com/user-attachments/assets/3012f1c8-49cc-47ad-9ed9-8cd465524386) message格式我拉的是您的为啥还是有错误呢 INFO: 127.0.0.1:52696 - "POST /v1/chat/completions HTTP/1.1"...

OSWorld cannot parse the response

> osworld上的实现目前发现有两处bug，已经提pr修复：[xlang-ai/OSWorld#194](https://github.com/xlang-ai/OSWorld/pull/194) > > 另外local variable 'response' referenced before assignment这个报错是非法python代码的错误，点击位置是否正确需要把trace可视化出来检查历史消息扩展部分是不是也需要修改呀我改为这样可以解决问题 messages.append({ "role": "assistant", "content": [ {"type": "text", "text": add_box_token(history_response)} ] })

想针对特定领域的机器做适配微调，请教一下数据集应该如何构建

> 如题，主要是数据格式，还有哪几块的数据，比如截图+操作指令和对应输出的动作轨迹？数据量大概要多少呢。希望有大佬回答一下，感激不尽朋友您好请问您已经完成了数据集构建的这个工作吗可以简单咨询您一下，大概的训练流程吗想了解一下这个流程还有遇到的问题

点击输入框后键盘未激活（ADB键盘）但模型使用了type操作

> > 我理解这里的检查其实没有具体指怎么检查，希望模型自发地通过某些方式去做是吗；如果是这样的话可能模型能力还不足够，或者可以在instruction里把如何检查说得具体一些 > > 我是在user_instruction末尾加上一条要求，要求模型在输入前检查输入法键盘激活状态。具体的方式比如我用ADB键盘的话，就是检查屏幕底部是否出现"ADB Keyboard{ON}"文本。那你得需要像阿里的MobileAgent里面用 OCR groundingDINO 这种先检测当前页面文本和图标吧，有了检测在都输入给模型让模型判断当前的输入里面有没有ADB Keyboard{ON}文本

chuheww

OSWorld cannot parse the response

OSWorld cannot parse the response

OSWorld cannot parse the response

OSWorld cannot parse the response

想针对特定领域的机器做适配微调，请教一下数据集应该如何构建

点击输入框后键盘未激活（ADB键盘）但模型使用了type操作

如何在windows上通过transform来推理这个模型，有人成功的嘛？

v1.5版本的7B模型在element_ocr场景下大幅低于v1版本的2B模型，是否符合预期

v1.5版本的7B模型在element_ocr场景下大幅低于v1版本的2B模型，是否符合预期

v1.5版本的7B模型在element_ocr场景下大幅低于v1版本的2B模型，是否符合预期