visualwebarena icon indicating copy to clipboard operation
visualwebarena copied to clipboard

Configurations Setup for all Model Types

Open anithselva opened this issue 9 months ago • 11 comments

Hello,

Could you please share some of the configuration settings to reproduce the various model types?

I tried to reproduce the caption-augmented setup (Acc Tre + Caps) but my value was closer to the Multimodal result that had the Image Screenshot also as an input. Hoping I could get more clarification on how to switch between the 4 modes.

Here is my configurations

(1) Text-Only observation_type: accessibility_tree action_set_tag: id_accessibility_tree

(2) Caption-Augmented observation_type: accessibility_tree_with_captioner action_set_tag: id_accessibility_tree

(3) Multimodal observation_type: ??? action_set_tag: id_accessibility_tree

(4) Multimodal (SoM) observation_type: image_som action_set_tag: som

anithselva avatar May 02 '24 17:05 anithselva