DropoutUncertaintyDemos icon indicating copy to clipboard operation
DropoutUncertaintyDemos copied to clipboard

Positive reward with 4 walls

Open mryellow opened this issue 10 years ago • 2 comments

looking at a wall with 4 eyes while walking into it resulted in a positive reward;

http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

That's the > 0.75 threshold for forward reward. With a few eyes missing the walls, the overall proximity drops more it does in most cases, if the agent can get a little bonus for forward at that stage it will take it.

https://github.com/yaringal/DropoutUncertaintyDemos/blob/14fa4689bcf29e280bf3bb5c967f8bf10e530178/convnetjs/rldemo_comparison.js#L368

Generally I've found the threshold still works ok, takes tweaking but is kind of a "this is a doorway you'll accept" vs "that's a little too risky" in the end. Thinking the best bet would be to remove it and punish harder on walls some other way, so the forward bonus can't win out against walls when multiplied by those last few decimal points of the proximity being fed in.

mryellow avatar Jul 23 '15 01:07 mryellow

This might work a little better, falling off quickly on the low end, instead of forward reward the instant walls are considered "clear".

if(this.actionix === 0 && proximity_reward > 0.2) forward_reward = 0.1 * Math.sqrt(proximity_reward-0.2);

edit: Actually probably behaves better the other way, sqrt will squeeze through some pretty small gaps though.

if(this.actionix === 0) forward_reward = 0.1 * Math.pow(proximity_reward, 2);

mryellow avatar Jul 26 '15 06:07 mryellow

I'm finding generally that dropout (regardless of uncertainty implemented or not) will become obsessed with any conditional reward which jumps up/down out of nowhere.

For instance halving forward reward for forward turns:

if (this.actionix === 0 || this.actionix === 1 || this.actionix === 2) {
    forward_reward = whatever number;
    if (this.actionix === 1 || this.actionix === 2) {
        forward_reward = forward_reward / 2;
    }
}

Dropout will find itself hard up against a wall, looking along it, exploiting what it can from the half forward reward. Smoothly distributed rewards on the other hand will be exploited without so much unexpected behavior.

mryellow avatar Sep 03 '15 21:09 mryellow