DOM-Q-NET
DOM-Q-NET copied to clipboard
Graph-based Deep Q Network for Web Navigation
DOM-Q-NET: Grounded RL on Structured Language
"DOM-Q-NET: Grounded RL on Structured Language" International Conference on Learning Representations (2019). Sheng Jia, Jamie Kiros, Jimmy Ba. [arxiv] [openreview]
Demo
Trained multitask agent: https://www.youtube.com/watch?v=eGzTDIvX4IY
Facebook login: https://www.youtube.com/watch?v=IQytRUKmWhs&t=2s
Requirement
Need to download selenium & install chrome driver for selenium..
Installation
- Clone this repo
- Download MiniWoB++ environment from the original repo https://github.com/stanfordnlp/miniwob-plusplus
and copy miniwob-plusplus/html folder to miniwob/html in this repo - In fact, this html folder could be stored anywhere, but remember to perform one of the following actions:
- Set environment variable
"WOB_PATH"to
file://"your-path-to-miniwob-plusplus"/html/miniwob
E.g. "your-path-to-miniwob-plusplus" is "/h/sheng/DOM-Q-NET/miniwob- Directly modify the
base_urlon line 33 of instance.py to
"your-path-to-miniwob-plusplus"/html/miniwob
In my case,base_url='file:///h/sheng/DOM-Q-NET/miniwob/html/miniwob/'
Run experiment
Experiment launch files are stored under runs
For example,
cd runs/hard2medium9tasks/
sh run1.sh
will launch a 11 multi-task (social-media search-engine login-user enter-password click-checkboxes click-option enter-dynamic-text enter-text email-inbox-delete click-tab-2 navigation-tree) experiment.
Multitask Assumptions
State & Action restrictions
| Item | Maximum number of items |
|---|---|
| DOM tree leaves (action space) | 160 |
| DOM tree | 200 |
| Instruction tokens | 16 |
Attribute embeddings & vocabulary
| Attribute | max vocabulary | Embedding dimension |
|---|---|---|
| Tag | 100 |
16 |
| Text (shared with instructions) | 600 |
48 |
| Class | 100 |
16 |
- UNKnown tokens
These are assigned to a random vector such that the cosine distance with the text attribute can yield 1.0 for the direct alignment.
Acknowledgement
Credit to Dopamine for the implementation of prioritized replay used in dstructs/dopamine_segtree.py
