Multi-Agent-Reinforcement-Learning-Environment
Multi-Agent-Reinforcement-Learning-Environment copied to clipboard
Under what circumstances will the env_FireFighter be done?
Thank you for your great work, I'm a new learner of the RL field, and I'm learning how to build my model.
for other environments, I can see how the game finish, such as agents put the box in the right location. but in the FireFighter environment, when will an episode be done?
I have tried to let fire level == [0,0,0,0] (or [2,2,2,2]) be the goal, at this time I will end the episode and give a positive reward. I use a DQN to learn the strategy, but it seems that whatever actions I choose, the fire level will increase especially the first and last house. I wonder how I can set the stopping criterion in this environment, do you have any idea? thank you!
This is a rewrite of environment in "Exploiting Locality of Interaction in Factored Dec-POMDPs". I have not used it yet, so I am not 100% sure about the correctness of the code (sorry). But it worth mentioning that 'done' thing is only in episodic environment and this problem should not be episodic, you may use a goal as put out all fires and see if DQN can finish that goal. I am not sure about if DQN can solve this problem, because it is partially observable and stochastic, it is hard though it seems to be simple.