search-agents How many times is the value function evaluated in the your " VisualWebArena benchmark" experiment?

How many times is the value function evaluated in the your " VisualWebArena benchmark" experiment?

Open 870572761 opened this issue 6 months ago • 1 comments

I found if I just run the scripts to test "VisualWebArena benchmark" experiment. The task finnally will fail in many times. Did you set just one model in models? Did you just make model evaluate once time?（Maybe I think It would be better to average the model evaluations）

Aug 14 '24 14:08 870572761

search-agents search-agents copied to clipboard

How many times is the value function evaluated in the your " VisualWebArena benchmark" experiment?

search-agents
search-agents copied to clipboard