DecisionTransformerInterpretability Implement IBAC/SNI and measure the effect on model interpretability

Implement IBAC/SNI and measure the effect on model interpretability

Open jbloomAus opened this issue 1 year ago • 0 comments

Selective Noise Injection (SNI) and Information Bottleneck Actor-Critic (IBAC) make models better at generalising (including in at least one MiniGrid environment). It seems like a fun hack-dayish kind of effort to test this out.

This would currently be bottlenecked by the TrajectoryPPO class possibly not being perfectly working yet but I'm putting this here in case someone is ambitious and wanted to give it a shot.

References:

https://github.com/microsoft/IBAC-SNI/
https://arxiv.org/pdf/1910.12911.pdf

I consider this somewhat similar spiritually to Engineering Monosemanticity in Toy Models although I have no idea if this is exactly true.

Mar 19 '23 09:03 jbloomAus

DecisionTransformerInterpretability DecisionTransformerInterpretability copied to clipboard

Implement IBAC/SNI and measure the effect on model interpretability

DecisionTransformerInterpretability
DecisionTransformerInterpretability copied to clipboard