understanding-ai
understanding-ai copied to clipboard
Diversity Is All You Need: Learning Skills without a Reward Function
https://arxiv.org/abs/1802.06070
Abstract
- Learn skills by maximizing information using maximum entropy policy
- Train typical reinforcement learning with best skill after unsupervised learning
1. Introduction
- Skill is just a policy
- Key Idea is discriminability of skills
- Skills has to be distinguishable
- Skills has to be as diverse as possible
2. Related Work
- Three important distinction of paper
- Using maximum entropy policies to force skills to be diverse
- Fix distribution p(z)
- Watches every states
Paper says that maximizing diversity is better than specific reward on complex behaviors
3. Diversity is all you need
3.1. How it works
H[a|s] = MI(a,z|s) from continuous action space
F(Θ) = H[a|s,z] + H[z] - H[z|s]
- H[a|s,z]: skill act randomly
- H[z]: p(z) to have high entropy
- H[z|s]: infer z from current state
3.2. Implementation
4. What skills are learned?
(alpha with 0.01 is best discriminative illustration)
Question
- Is this model similar to random forest?
- What is critic network?
- What is M-Projection?