acme
acme copied to clipboard
Not enough documentation for EpisodeAdder
Having read the docs and the code for the episode adder I still don't quite understand it.
Is it just the simplest adder? E.g. adding every transition to one long buffer? Or is it doing something else?
Thanks!
It's an adder which adds entire episodes into replay. For this to make sense your agent has to be recurrent. An alternative is e.g. adding transitions (SARSA tuples) into replay: NStepTransitionAdder will slice your episode into SARSA chunks and upload each to replay independently.
Yes, this is exactly right. In contrast to the transition adder, which turns things into (possibly n-step) transitions, or the sequence adder, which slices up the episode into (possibly strided, possibly overlapping) sequences of fixed length, this just creates one sequence of arbitrary length, depending on the episode length.
We could improve the documentation of this.
So I guess replay buffer doesn't only store "transitions", but "sequences", and a transition is a special case of such a sequence?
Not exactly. When you work with SARSA tuples it indeed works like this. When working with sequences, they are stored as a tuple (observations, actions, rewards, discounts)
, where each tensor has shape [batch, sequence_length, ...]
.
So e.g. a_1 = actions[:, 1]
Oh got it.
Perhaps adding dimensions to the docs is a good idea. Looking at that picture above it is not that clear. But after your explanation it is immediately clear.