Easy-Transformer
Easy-Transformer copied to clipboard
[Proposal] Save and Load subgraph as dict
Proposal
Add a structured representation of the subgraph/ circuit that can be saved and loaded similar to ACDC.
Motivation
When using TransformerLens I can find subgraphs through attribution patching. However, there seems to be no easy way to use the result further downstream due to the lack of structured representation. While I can (might?) piece together all components, it is not straightforward to get the result out for plotting, archiving, re-loading at a later point.
Pitch
Having a save_to_dict would solve the above 'shortcomings' in working further downstream with results gained from TL. It would enable loading a patched subgraph in a further step.
Alternatives
I have started finding and mapping values from Attribution_Patching_Demo.ipynb but not everything is clear, as compared to ACDC this example has a finder granularity (k,v,q,..)
Additional context
Checklist
- [x] I have checked that there is no similar issue in the repo (required)
Can you say more about what this would look like? I'm quite confused by the proposal. TransformerLens doesn't even have innate support for path patching, so I don't see what it would mean to support subgraphs. Could you eg give an example snippet of code for what you imagine using this feature to look like, and the interface?
On Thu, 25 Apr 2024 at 11:55, chris-aeviator @.***> wrote:
Proposal
Add a structured representation of the subgraph/ circuit that can be saved and loaded similar to ACDC. Motivation
When using TransformerLens I can find subgraphs through attribution patching. However, there seems to be no easy way to use the result further downstream due to the lack of structured representation. While I can (might?) piece together all components, it is not straightforward to get the result out for plotting, archiving, re-loading at a later point. Pitch
Having a save_to_dict would solve the above 'shortcomings' in working further downstream with results gained from TL. It would enable loading a patched subgraph in a further step. Alternatives
I have started finding and mapping values from Attribution_Patching_Demo.ipynb but not everything is clear, as compared to ACDC this example has a finder granularity (k,v,q,..) Additional context
grafik.png (view on web) https://github.com/neelnanda-io/TransformerLens/assets/11522213/26b8cd2c-719c-4f0c-aea2-b31c6f79b258 Checklist
- [ x] I have checked that there is no similar issue https://github.com/neelnanda-io/transformerlens/issues in the repo ( required)
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/554, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKOQQY3AHS274LGOEN3Y7DOJRAVCNFSM6AAAAABGYUEF52VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3DGMRYGYZDAOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure - the confusion might be more on my side, but I try to explain:
When I look at the data in https://github.com/neelnanda-io/TransformerLens/blob/main/demos/Attribution_Patching_Demo.ipynb I have Head Inputs and Head outputs (see Head Path Attribution Patching), I also have MLP outputs. My thinking here is: with a known network architecture this should give me a subgraph/ circuit if I can construct the edges.
What I try to achieve is construction a graph / circuit that describes the patched model. An exemplary interface might be https://github.com/hannamw/EAP-IG/blob/main/eap/graph.py
I'm in the early days of building infrastucture to create and eval experiments via GPRC/API calls.
CC @UFO-101 who is building a general automated interp library. In my opinion it's better to build a library on top of TL, rather than inside TL. What advantages would there be to adding this into TL? My vibe is that the majority of TL users are not doing automated interp (even if they should be >:) ...)
+1 to Arthur, I think this would be a cool thing for someone to build on top
On Thu, 25 Apr 2024 at 16:55, Arthur Conmy @.***> wrote:
CC @UFO-101 https://github.com/UFO-101 who is building a general automated interp library. In my opinion it's better to build a library on top of TL, rather than inside TL. What advantages would there be to adding this into TL? My vibe is that the majority of TL users are not doing automated interp (even if they should be >:) ...)
— Reply to this email directly, view it on GitHub https://github.com/neelnanda-io/TransformerLens/issues/554#issuecomment-2077631083, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASRPNKNM7MOVXUBOFJMXMODY7ERPXAVCNFSM6AAAAABGYUEF52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZXGYZTCMBYGM . You are receiving this because you commented.Message ID: @.***>
@ArthurConmy thanks for pointing this out - yeah I don't wanna waste anyones time, feel free to close this if irrelevant.
while using TL you are doing what I'm looking for in ACDC, is there any conceptual difference (other than Act. patching vs. Attr. patching) that I don't see?
FYI the library is now released: https://ufo-101.github.io/auto-circuit/ Post explaining how it works: https://www.lesswrong.com/posts/caZ3yR5GnzbZe2yJ3/how-to-do-patching-fast
@UFO-101 - awesome - thanks for sending a note :+1: