mbrl-lib icon indicating copy to clipboard operation
mbrl-lib copied to clipboard

[Feature Request] Training data selection: Create more "interesting" Replay Buffer Iterators

Open RaghuSpaceRajan opened this issue 3 years ago • 4 comments

🚀 Feature Request

Create Replay Buffer Iterators that can select training and validation data in various "interesting" ways, similar to TransitionIterator and BootStrapIterator in https://github.com/facebookresearch/mbrl-lib/blob/b0aabd79941efe8b56bcabbd1b43bf497b9b1746/mbrl/replay_buffer.py

Examples:

  1. Select transitions from highly-rewarding trajectories - this could be used to perform analyses of how data selection impacts MBRL, objective mismatch, etc.
  2. Select transitions randomly from the replay buffer to have a fixed size of training/validation data.

Motivation

This would make analysis similar to https://arxiv.org/abs/2002.04523 and https://arxiv.org/abs/2102.13651 easy to perform.

Pitch

It should be fairly easy to implement similar to TransitionIterator and BootStrapIterator above. (Taking care of trajectory/episodic boundaries could be a bit tricky.)

RaghuSpaceRajan avatar Apr 07 '21 11:04 RaghuSpaceRajan

Thanks @RaghuSpaceRajan . cc'ing @natolambert since this is highly relevant to his work. I think this proposal is the most straightforward way to do this on the data management side.

luisenp avatar Apr 07 '21 13:04 luisenp

Yes, I have a version of this in my private repo, I will create a PR soon for it. The way I did it was for associating a "weight" for each transition, but some of the core functionality was a function to "update weights" for each trajectories. When updating the weights, it would be easy to create a ranking or heuristic mapping of some sort.

natolambert avatar Apr 07 '21 16:04 natolambert

Related comment, I think it may be worthwhile to have an optional "rich logging" mode, where things like candidate actions, action sequences (plans) at each step, trajectories, and more are saved for every trial in the learning process. It accumulates a lot, but having access to this is useful for debugging.

natolambert avatar Apr 07 '21 17:04 natolambert

Feel free to open a feature request issue for this as well @natolambert

luisenp avatar Apr 07 '21 17:04 luisenp