SeeAct icon indicating copy to clipboard operation
SeeAct copied to clipboard

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Results 10 SeeAct issues
Sort by recently updated
recently updated
newest added

Could you please provide the complete offline evaluation code for mm-mind2web? Currently, only the prediction demo code is available, lacking the full dataset loop and evaluation metric to reproduce the...

Thank you very much for your work. I have found a potential bug in your MM-Mind2web model. It seems that each data point only contains a list of selectable actions...

https://github.com/OSU-NLP-Group/SeeAct/blob/8af310159af97b123ff07abb925c497bb1ca2478/src/data_utils/format_prompt_utils.py#L204 Sorry It's harder for me to do a PR, I'm working on another codebase

![1712585243173](https://github.com/OSU-NLP-Group/SeeAct/assets/28804414/9689c185-4160-4296-87b6-ce8baa2e4e37) I wanted to visualize how the model action on the Mind2Web dataset, but SeeAct didn't seem to do that. When computing online, the output "success_or_not" is always empty, which...

![image](https://github.com/OSU-NLP-Group/SeeAct/assets/63557613/6c4d8c89-1199-4cf3-a6c0-cab75c0ca48d) While all of this information is nice and all, is there a comparison to experiments without grounding? It can be argued that grounding may hurt performance without knowing what...

hey-- first off, really, really cool project! first off, are all actions keyed off a specific element in the list, or is there some way to conduct certain actions without...

Dear Authors, Thank you for this brilliant work. I want to do some analysis on the trajectories of different methods in your paper (e.g. FLAN-T5, GPT-4, SeeAct with different grounding...

Thank you for this inspiring work and for releasing the code! Are you also planning to release the model predictions? More specifically, are you planning to release oracle action grounding...