search-agents
search-agents copied to clipboard
How many times is the value function evaluated in the your " VisualWebArena benchmark" experiment?
I found if I just run the scripts to test "VisualWebArena benchmark" experiment. The task finnally will fail in many times. Did you set just one model in models? Did you just make model evaluate once time?(Maybe I think It would be better to average the model evaluations)