bsuite icon indicating copy to clipboard operation
bsuite copied to clipboard

inconsistency fix for stochastic deep sea

Open cgao3 opened this issue 4 years ago • 2 comments

  1. The description of stochastic sea environment says "adds N(0,1) noise to the end of states of the chain", but in line 125, noisy reward were only added when "column" is either 0 or "_size -1".
  2. The description of stochastic sea environment says "act right with 1 - 1/N moves agent to right", but in line 121, i.e., when agent is at cell "(_size-1, _size-1)" and act right, there is no such stochasticity.

This pull request fixes these two inconsistency issues. Without this pull request fix, expected value under optimal policy is more complicated; with these fixes, expected value for optimal policy is simply given by (1-1/N)^N0.99 + (-0.01 + E[norm(0,1)])(1-(1-1/N)^N)

cgao3 avatar Oct 22 '20 19:10 cgao3

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

google-cla[bot] avatar Oct 22 '20 19:10 google-cla[bot]

@googlebot I signed it!

@googlebot I signed it!

cgao3 avatar Oct 22 '20 19:10 cgao3