CogAgent icon indicating copy to clipboard operation
CogAgent copied to clipboard

Add Assertion Class Action

Open Hua-Wen opened this issue 10 months ago • 5 comments

Feature request / 功能建议

At present, there is only the action of the execution class, and there is no action for page state judgment. Integration with the test framework can be achieved if there are assertion actions

Motivation / 动机

Integration with automated testing frameworks for large model-based automated testing

Your contribution / 您的贡献

It seems that only fine-tuning can be done, and the registration of action through code is not seen. If there is related code, please tell me, I can complete this part of the code.

Hua-Wen avatar Jan 23 '25 07:01 Hua-Wen

Yes, this model is not capable of judgment actions, all outputs are executions. I am not quite sure what kind of operation you mean by "page state judgment"?

zRzRzRzRzRzRzR avatar Jan 24 '25 06:01 zRzRzRzRzRzRzR

意思是不是用来断言页面的状态, 比如当前处于什么页面 是不是跳转到了正确的页面 或者页面里是不是有哪些元素 这些元素是不是在正确的位置上 等等。。。

leeaction avatar Jan 24 '25 07:01 leeaction

Yes. For example, my task executed action1,action2,action3, and then LLM output end(). The whole process ends. However, this does not necessarily mean that my task was successfully executed.

According to the actual scenario, task: enter the account password and log in, action1: enter the account xxx, Action2: enter the password, Action3: click log in. The model then outputs end. (Even if the page does not really land successfully)

I hope to actively let the model check, for example, the page should appear "login success", the page should jump to the login success page, the model judges and returns bool type results, and the code throws an exception of assertion failure at the appropriate time.

Hua-Wen avatar Jan 24 '25 08:01 Hua-Wen

Oh, the issue you mentioned indeed was not perfectly resolved when we released this version of the model. Although as the documentation states, you can use the continuation feature, it does not solve the problem well. We will continue to optimize this issue in future model updates. We have marked it.

zRzRzRzRzRzRzR avatar Jan 25 '25 03:01 zRzRzRzRzRzRzR

Yes. For example, my task executed action1,action2,action3, and then LLM output end(). The whole process ends. However, this does not necessarily mean that my task was successfully executed.

According to the actual scenario, task: enter the account password and log in, action1: enter the account xxx, Action2: enter the password, Action3: click log in. The model then outputs end. (Even if the page does not really land successfully)

I hope to actively let the model check, for example, the page should appear "login success", the page should jump to the login success page, the model judges and returns bool type results, and the code throws an exception of assertion failure at the appropriate time.

Thank you for your advice, we are currently working on similar feature you mentioned, which requires more powerful general capabilities from CogAgent. Please stay tuned on our coming updates:)

jasonnoy avatar Jan 25 '25 03:01 jasonnoy