minerl
minerl copied to clipboard
delayed inventory update on MineRLObtainDiamond-v0
Hi, I think there's a delay in updating the inventory in MineRLObtainDiamond-v0
. Here I have saved the experiences (reward, frame, inventory, etc.) during evaluation. At timestep l=429 reward is 1, while the log
is zero, log
appears to be 1 at timestep l=431. Similarly, I get reward 2 and plank
appears next-timestep. While in Human data the inventory updates at an instant.
I can confirm the reward/inventory change mismatch. Attaching a notebook I used to reproduce it using a set seed and hardcoded actions. test reward vs inventory change by frame.ipynb.zip
This is true for all environments -- in fact we explicitly account for this delay in BASALT tests now.
@brandonhoughton told me the reason why we have this delay at one point. Could you repeat the reason here in case someone wants to work on this in the future?
Also it probably would be a good doc change for us to at least document or warn about this problem in the observation space docs: https://minerl.readthedocs.io/en/latest/environments/handlers.html#visual-observations-pov-third-person
Some handlers work on information from the server - so we do client tick, then server tick for a step. If you need the server to tell you something it won't be processed until the next tick. That being said I'd expect that you can claim things in your own inventory on pickup right away - likely this is a result of the reward handler being on the server so it sees the change before you, sends it via a custom message to the client. Then on the next step the change is visible to the client