Easy-Transformer [Proposal] Demo and Tutorial on Patchscopes and "Patching + Generation"

[Proposal] Demo and Tutorial on Patchscopes and "Patching + Generation"

Open HenryCai11 opened this issue 7 months ago • 3 comments

Proposal

UPDATE: Demo and Tutorial on Patchscopes and "Patching + Generation"

DEPRECATED: Replication of the original causal tracing from the ROME paper.

Motivation

I found that the original causal tracing method hasn't been supported here, and I think it has some advantages over the current activation patching method. For example, corruption with Gaussian noise might preserve more semantic information from the original sentence than corruption by changing words.

Pitch

To replicate the original causal tracing method from the ROME paper (https://arxiv.org/abs/2202.05262)

Alternatives

I also consider replicating the Patchscope here, which is also mentioned in issue #500. Since Patchscope can be considered as a more general framework for this kind of patching/intervention-based methods, implementing it here can also make causal tracing available. I'd like to open another issue for the replication of Patchscope later.

Additional context

I've implemented a version locally, and would like to put some examples here, comparing results from my implementation and from the original implementation.

Checklist

[x] I have checked that there is no similar issue in the repo (required)

Jul 16 '24 10:07 HenryCai11

Easy-Transformer Easy-Transformer copied to clipboard

[Proposal] Demo and Tutorial on Patchscopes and "Patching + Generation"

Proposal

Motivation

Pitch

Alternatives

Additional context

Checklist

Easy-Transformer
Easy-Transformer copied to clipboard