Easy-Transformer
Easy-Transformer copied to clipboard
[Proposal] Demo and Tutorial on Patchscopes and "Patching + Generation"
Proposal
UPDATE: Demo and Tutorial on Patchscopes and "Patching + Generation"
DEPRECATED: Replication of the original causal tracing from the ROME paper.
Motivation
I found that the original causal tracing method hasn't been supported here, and I think it has some advantages over the current activation patching method. For example, corruption with Gaussian noise might preserve more semantic information from the original sentence than corruption by changing words.
Pitch
To replicate the original causal tracing method from the ROME paper (https://arxiv.org/abs/2202.05262)
Alternatives
I also consider replicating the Patchscope here, which is also mentioned in issue #500. Since Patchscope can be considered as a more general framework for this kind of patching/intervention-based methods, implementing it here can also make causal tracing available. I'd like to open another issue for the replication of Patchscope later.
Additional context
I've implemented a version locally, and would like to put some examples here, comparing results from my implementation and from the original implementation.
Checklist
- [x] I have checked that there is no similar issue in the repo (required)