LightZero icon indicating copy to clipboard operation
LightZero copied to clipboard

how to well model a grid env when it changes frequently?

Open valkryhx opened this issue 1 month ago • 8 comments

Suppose there is a game, a grid 10 by 10 ,each position was placed a piece of gold with a randomly positive value , and an agent do the mining job on this grid. The rule is when the agent digs a position of pos(i,j), it get the value of the gold on the position,which is v(i,j), and the gold on the row and column which covers the pos(i,j) will be boomed, which means all value of gold on row(i) and col(j) will be 0 . So when the agent digs one position , it gets one value ,and the corresponding one row and one column's value will be 0 .The agent can FLASH( no need to move step by step,but like a teleport) to any position on the grid.

Now we want to train the agent to get as much value as possible, and avoid to step in the digged position(because the digged postion now values 0) ,in different 10 by 10 grids. How to model the observation of the grid? Should I pass the one frame of current grid env ? Or should I pass the last few steps of last frames of grids? It seems to be a frequently changing env like Go or 2048 game, do you have some advises to model the env like this kind of game?

valkryhx avatar May 13 '24 01:05 valkryhx