papernotes icon indicating copy to clipboard operation
papernotes copied to clipboard

Deep Reinforcement Learning with a Natural Language Action Space

Open howardyclo opened this issue 6 years ago • 1 comments

Metadata

  • Authors: Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng and Mari Ostendorf
  • Organization: University of Washington and Microsoft Research
  • Conference: ACL 2016
  • Paper: https://arxiv.org/pdf/1511.04636.pdf
  • Game simulator: https://github.com/jvking/text-games

howardyclo avatar Jun 12 '18 06:06 howardyclo

Summary

This paper purposes a Deep Reinforcement Relevance Network (DQN-based) learn to understand the texts from both text-based state and action (learn the "relevance" between state and action) via playing a text-based game. The objective is to navigating through the sequence of texts (input state and output action) so as to maximize the long-term reward in the game (using Q-learning).

Text-based Games

Figure 3

  • Parser-based (less complex action language): Typed-in commands such as "go east" or "get key".
  • Choice-based and hypertext-based (more complex action language): Present actions after or embedded within the state text.

Deep Reinforcement Relevance Network (DRRN)

Figure 1

  • The authors explained the success of the DRRN is due to the separation embedding space of state and action and interaction function (e.g., dot-product of state and action embeddings), resulting in a good estimation of Q-function.
  • Bag-of-words (BOW) as text input feature.
  • Difference vocabulary between state and action space.
  • Action are selected from: argmax_{a_t^i} Q(s_t, a_t^i), where s_t and a_t^i are BOW representation, t denotes t-th time step and i denotes the i-th action (choice-based game).

Dataset Statistics

Table 1

Result

Table 4

Related Work

Narasimhan et al. use LSTM to characterize the state space in a DQN framework for learning control policies for parser-based text games. They show that more complex sentence representation can give further improvements. However, this paper tried LSTM in "“Machine of Death" and did not improve. This may due to the scale of the task or the embeddings need to be trained on large dataset.

Personal Thoughts

The vocabulary size of both state space and action space are relatively smaller than many NLP tasks. Wonder if it can still perform well when the vocabulary size is large.

References

howardyclo avatar Jun 12 '18 09:06 howardyclo