Does stable baselines provide an automatic way of computing the sample efficiency of an RL algorithm?

Open nbro opened this issue 5 years ago • 1 comments

It doesn't seem that SB provides an automatic way of computing the sample efficiency of an RL algorithm. I would be glad to be wrong.

Of course, computing the sample efficiency may not be a hard task: you just need to count the number of samples needed to reach a certain performance or, alternatively, assess the performance given a limited/fixed number of samples to learn from. I think this could be done with a callback, by saving the statistics (both the number of samples and sum of rewards) of the episode to a file, or maybe there's already a wrapper that saves these statistics (maybe SB's monitor or gym's RecordEpisodeStatistics, which I've never used? ), and we just need to use them to compute the sample efficiency.

So, could you please provide a simple example of how to compute the sample efficiency of an RL algorithm (with SB)? Is there a library that provides this type of functionality (i.e. measuring their performance in more sophisticated ways, i.e. not just the return) for RL algorithms?

Dec 01 '20 15:12 nbro

could you please provide an example of how to compute the sample efficiency of an RL algorithm?

It looks like both the Monitor wrapper and EvalCallback should do the trick (cf doc). For the rest, it is more custom code... We have plotting scripts in the rl baselines3 zoo (using stable-baselines3) to load the saved data.

Dec 01 '20 16:12 araffin