awesome-llm-interpretability suggested adding PAIR work; and how to add new suggestions?

suggested adding PAIR work; and how to add new suggestions?

Open iislucas opened this issue 1 year ago • 0 comments

Your contributing doc link in the readme is broken :) So I'm making a suggestion here instead of as a pull request; but you might be interested in the PAIR team's work (pair.withgoogle.com and https://github.com/pair-code). In particular, we do a bunch of work on interpretability including:

Interactive Explorable visualizations (https://pair.withgoogle.com/explorables/) explaining important and interesting ML phenomena; of particular relevant to LLMs are:

Do Machine Learning Models Memorize or Generalize? (won VISxAI best submission 2023, entering it into the VISxAI all of fame)
What Have Language Models Learned? (won VISxAI best submission in 2021, entering it into the VISxAI all of fame)

Code/Tools: (The Learning Interpretability Toolkit/Tool) https://pair-code.github.io/lit/ a popular tool, especially in google, for using interpretability tools with ML models (most often used for language models, but works with many kinds of models and data).

Some recent papers on interpretability of language models by PAIR:

“Interpretability Illusions in the Generalization of Simplified Models” – Dan Friedman, Andrew Lampinen, Lucas Dixon, Danqi Chen, Asma Ghandeharioun. [arxiv]
(EMNLP 2024) "Self-Influence Guided Data Reweighting for Language Model Pre-training", M Thakkar, T Bolukbasi, S Ganapathy, S Vashishth, S Chandar, P Talukdar [arxiv]
(EMNLP 2024). "Data Similarity is Not Enough to Explain Language Model Performance" - Greg Yauney, Emily Reif, David Mimno [acl]
(NeurIPS 2023) "Post Hoc Explanations of Language Models Can Improve Language Models" [arxiv] - Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju
NeurIPS 2023 Spotlight. "Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models" [arXiv, Tweet Summary] - Peter Hase, Mohit Bansal, Been Kim, Asma Ghandeharioun

And there's a lot more here: https://pair.withgoogle.com/research/

Dec 26 '23 13:12 iislucas

awesome-llm-interpretability awesome-llm-interpretability copied to clipboard

suggested adding PAIR work; and how to add new suggestions?

awesome-llm-interpretability
awesome-llm-interpretability copied to clipboard