nnsight Add image to image diffusion

This is very similar to Diffusion but instead of seed takes image of the types specified by the model. For Stable Diffusion, accepted types can be found here.

I'm happy to use this to kick off conversations on this topic as well.

Apr 22 '25 20:04 Kmoneal

Hi @Kmoneal - thank you very much for creating this PR!

Unfortunately, I've been having some trouble getting this implementation to work. Would it be kindly possible for you to share a minimally reproducible example of how to do the following (e.g., with stabilityai/stable-diffusion-3.5-large):

Store the activations of the residual stream (e.g., output of the transformer block at index 24), for any choice/range of timestep.
Intervene on the activations of the above (e.g., by tripling the activation of a particular dimension), for any choice/range of timestep.

Any help would be much appreciated! Thanks again. :)

May 06 '25 23:05 ericluo04

Figured it out! Turns out you can't specify the prompt using prompt = "..." but just have to enter it directly as the first parameter value. See below for extracting the residual stream of the 25th layer (index 24) in stabilityai/stable-diffusion-3.5-large for the first step. Note that init_image is of type PIL.Image.Image.

# transformer block layers
layers = pipe.transformer.transformer_blocks

with pipe.generate("", negative_prompt="", guidance_scale=7.5, 
                   image=init_image, width=832, height=1248,
                   strength=.5, num_inference_steps=4,
                   seed=None):
    # initialize list to store activations
    res_stream = nnsight.list().save() 
    res_stream_text = nnsight.list().save()
    res_stream_image = nnsight.list().save()
    
    # loop over steps, can use layer.all() to extract for all steps
    with layers.iter[0:1]:
        # 24th layer output residual stream for text and image stream
        res_stream.append(layers[24].output)
        # 25th layer input for text stream (to check if same as above)
        res_stream_text.append(layers[25].norm1_context.input)
        # 25th layer input for image stream (to check if same as above)
        res_stream_image.append(layers[25].norm1.input)

May 07 '25 19:05 ericluo04