agents
agents copied to clipboard
Should time_step_spec of an array valued reward return an array valued discount?
The time_step_spec function only takes observation_spec and reward_spec array specifications, but if the reward_spec specifies a multidimensional array, shouldn't the discount_spec match its shape or at least accept an argument to know if this should be the case?
I am also having the same problem.
Any batched py_environment.PyEnvironment
seems to require an array of discount_spec and step_type. @guachoperez Did you find any way round this?