understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Diversity Is All You Need: Learning Skills without a Reward Function

Open flrngel opened this issue 6 years ago • 2 comments

https://arxiv.org/abs/1802.06070

Abstract

  • Learn skills by maximizing information using maximum entropy policy
  • Train typical reinforcement learning with best skill after unsupervised learning

1. Introduction

  • Skill is just a policy
  • Key Idea is discriminability of skills
    • Skills has to be distinguishable
    • Skills has to be as diverse as possible

2. Related Work

  • Three important distinction of paper
    1. Using maximum entropy policies to force skills to be diverse
    2. Fix distribution p(z)
    3. Watches every states

Paper says that maximizing diversity is better than specific reward on complex behaviors

3. Diversity is all you need

image image

3.1. How it works

H[a|s] = MI(a,z|s) from continuous action space

F(Θ) = H[a|s,z] + H[z] - H[z|s]

  • H[a|s,z]: skill act randomly
  • H[z]: p(z) to have high entropy
  • H[z|s]: infer z from current state

3.2. Implementation

image

4. What skills are learned?

image (alpha with 0.01 is best discriminative illustration)

Question

  • Is this model similar to random forest?
  • What is critic network?
  • What is M-Projection?

flrngel avatar Mar 02 '18 05:03 flrngel