acme icon indicating copy to clipboard operation
acme copied to clipboard

Not enough documentation for EpisodeAdder

Open drozzy opened this issue 4 years ago • 6 comments

Having read the docs and the code for the episode adder I still don't quite understand it.

Is it just the simplest adder? E.g. adding every transition to one long buffer? Or is it doing something else?

Thanks!

drozzy avatar Jun 30 '20 07:06 drozzy

It's an adder which adds entire episodes into replay. For this to make sense your agent has to be recurrent. An alternative is e.g. adding transitions (SARSA tuples) into replay: NStepTransitionAdder will slice your episode into SARSA chunks and upload each to replay independently.

Bihaqo avatar Jun 30 '20 08:06 Bihaqo

Yes, this is exactly right. In contrast to the transition adder, which turns things into (possibly n-step) transitions, or the sequence adder, which slices up the episode into (possibly strided, possibly overlapping) sequences of fixed length, this just creates one sequence of arbitrary length, depending on the episode length.

aslanides avatar Jun 30 '20 08:06 aslanides

We could improve the documentation of this.

aslanides avatar Jun 30 '20 08:06 aslanides

So I guess replay buffer doesn't only store "transitions", but "sequences", and a transition is a special case of such a sequence? Screen Shot 2020-07-02 at 2 08 01 AM

drozzy avatar Jul 02 '20 06:07 drozzy

Not exactly. When you work with SARSA tuples it indeed works like this. When working with sequences, they are stored as a tuple (observations, actions, rewards, discounts), where each tensor has shape [batch, sequence_length, ...]. So e.g. a_1 = actions[:, 1]

Bihaqo avatar Jul 06 '20 10:07 Bihaqo

Oh got it.

Perhaps adding dimensions to the docs is a good idea. Looking at that picture above it is not that clear. But after your explanation it is immediately clear.

drozzy avatar Jul 06 '20 14:07 drozzy