understanding-ai
understanding-ai copied to clipboard

Published 20 hours ago •

Reame
Issues

Diversity Is All You Need: Learning Skills without a Reward Function

Open flrngel opened this issue 6 years ago • 2 comments

https://arxiv.org/abs/1802.06070

Abstract

Learn skills by maximizing information using maximum entropy policy
Train typical reinforcement learning with best skill after unsupervised learning

1. Introduction

Skill is just a policy
Key Idea is discriminability of skills
- Skills has to be distinguishable
- Skills has to be as diverse as possible

2. Related Work

Three important distinction of paper
1. Using maximum entropy policies to force skills to be diverse
2. Fix distribution p(z)
3. Watches every states

Paper says that maximizing diversity is better than specific reward on complex behaviors

3. Diversity is all you need

3.1. How it works

H[a|s] = MI(a,z|s) from continuous action space

F(Θ) = H[a|s,z] + H[z] - H[z|s]

H[a|s,z]: skill act randomly
H[z]: p(z) to have high entropy
H[z|s]: infer z from current state

3.2. Implementation

4. What skills are learned?

(alpha with 0.01 is best discriminative illustration)

Question

Is this model similar to random forest?
What is critic network?
What is M-Projection?

Mar 02 '18 05:03 flrngel