miniwob-plusplus icon indicating copy to clipboard operation
miniwob-plusplus copied to clipboard

[Question] The format of 12K human demonstration

Open njucckevin opened this issue 1 year ago • 1 comments

Question

Hi, I'm confused with the human demonstrations provided in https://github.com/stanfordnlp/miniwob-plusplus-demos. These demonstrations seem mussy, which has dozens of (eg: 20+) state contain mouse up/down and keyboard up/down in one trajectory. Is there any method to get the cleaned or simplified actions, e.g. {'action': click, 'ref': '6'}, {'action': "type", 'ref': '10', "typed_text": "John"}. I want to use these 12k demonstrations to supervised finetuned my own model.

Thanks a lot!

njucckevin avatar Dec 04 '23 10:12 njucckevin

The demonstrations in that repository record the raw JavaScript events. Mouse clicks are also recorded as mouse up + mouse down, for example.

In the project I was involved in (Workflow-Guided Exploration), we converted the MiniWoB demonstrations into a graph structure. The method _parse_raw_demo_original is probably close to what you want (though it probably won't work out of the box; the code is pretty old).

There is also the paper Understanding HTML with Large Language Models who trained a model using the demonstrations, though I don't know where their code is.

In any case, I have created a feature request for the conversion code (#87).

ppasupat avatar Dec 05 '23 02:12 ppasupat

Closing in favor of #87

jkterry1 avatar Jul 05 '24 21:07 jkterry1