search-agents icon indicating copy to clipboard operation
search-agents copied to clipboard

How many times is the value function evaluated in the your " VisualWebArena benchmark" experiment?

Open 870572761 opened this issue 6 months ago • 1 comments

image I found if I just run the scripts to test "VisualWebArena benchmark" experiment. The task finnally will fail in many times. Did you set just one model in models? Did you just make model evaluate once time?(Maybe I think It would be better to average the model evaluations)

870572761 avatar Aug 14 '24 14:08 870572761