DecisionTransformerInterpretability
DecisionTransformerInterpretability copied to clipboard
Implement IBAC/SNI and measure the effect on model interpretability
Selective Noise Injection (SNI) and Information Bottleneck Actor-Critic (IBAC) make models better at generalising (including in at least one MiniGrid environment). It seems like a fun hack-dayish kind of effort to test this out.
This would currently be bottlenecked by the TrajectoryPPO class possibly not being perfectly working yet but I'm putting this here in case someone is ambitious and wanted to give it a shot.
References:
- https://github.com/microsoft/IBAC-SNI/
- https://arxiv.org/pdf/1910.12911.pdf
I consider this somewhat similar spiritually to Engineering Monosemanticity in Toy Models although I have no idea if this is exactly true.