DecisionTransformerInterpretability icon indicating copy to clipboard operation
DecisionTransformerInterpretability copied to clipboard

SVD Decomp / Explore ways to use dimensionality reduction to quickly understand what heads are doing.

Open jbloomAus opened this issue 1 year ago • 1 comments

This post is awesome. I think the value from using this method comes from both understanding the method better, understanding our models better and the editing could be cool too.

"highly interpretable semantic clusters" sound very cool. "Directly editing SVD representations" sounds very cool too.

Steps:

  • [ ] Make an SVD component in static viz.
  • [ ] Get a topk tokens per svd plot
  • [ ] Get a singular values per head plot
  • [ ] See how useful these are.
  • [ ] Look for high cosine similarity directions or something? Kinda like composition
  • [ ] Look at direct editing of the SVD decomp.

jbloomAus avatar May 12 '23 00:05 jbloomAus