DecisionTransformerInterpretability icon indicating copy to clipboard operation
DecisionTransformerInterpretability copied to clipboard

Upgrade Collect Demonstrations Workflow

Open jbloomAus opened this issue 1 year ago • 0 comments

The collect demonstrations utility is responsible for collecting example trajectories from a trained agent (one of the 3 ppo agent architectures supported, only two work). It provides a few different sampling procedures for doing this such as basic (proportional to softmax), temperature sampling, bottomk and topk. These enable us to train decision transformers on a broader distribution of actions/observations encouraging more robust features to be learned and a better calibrated RTG - behavior relationship (if you do behavioral cloning, you are restricted to training on only good trajectories with the offline agent which doesn't lead to good features).

It's not super clear yet how useful this is, but there's very obvious next steps to improve this utility that seem like a good engineering practice if anyone wants to help. The major goals are:

  • [ ] Hook up wandb tracking (add an arg for track, then log metrics to the dashboard)
  • [ ] media to log: videos of each of the rollouts (same as ppo rollout out code somewhat). We want to see qualitatively what kinds of trajectories we're sampling.
  • [ ] Metric to log: reward/time to finish for different rollout configs, we want to know which config setting sample what kinds of trajectories/outcomes.

It's possible that is an opportunity for the code around uploading videos which is very janky can be improved.

jbloomAus avatar Apr 27 '23 01:04 jbloomAus