nnsight icon indicating copy to clipboard operation
nnsight copied to clipboard

Add image to image diffusion

Open Kmoneal opened this issue 8 months ago • 2 comments

This is very similar to Diffusion but instead of seed takes image of the types specified by the model. For Stable Diffusion, accepted types can be found here.

I'm happy to use this to kick off conversations on this topic as well.

Kmoneal avatar Apr 22 '25 20:04 Kmoneal

Hi @Kmoneal - thank you very much for creating this PR!

Unfortunately, I've been having some trouble getting this implementation to work. Would it be kindly possible for you to share a minimally reproducible example of how to do the following (e.g., with stabilityai/stable-diffusion-3.5-large):

  1. Store the activations of the residual stream (e.g., output of the transformer block at index 24), for any choice/range of timestep.
  2. Intervene on the activations of the above (e.g., by tripling the activation of a particular dimension), for any choice/range of timestep.

Any help would be much appreciated! Thanks again. :)

ericluo04 avatar May 06 '25 23:05 ericluo04

Figured it out! Turns out you can't specify the prompt using prompt = "..." but just have to enter it directly as the first parameter value. See below for extracting the residual stream of the 25th layer (index 24) in stabilityai/stable-diffusion-3.5-large for the first step. Note that init_image is of type PIL.Image.Image.

# transformer block layers
layers = pipe.transformer.transformer_blocks

with pipe.generate("", negative_prompt="", guidance_scale=7.5, 
                   image=init_image, width=832, height=1248,
                   strength=.5, num_inference_steps=4,
                   seed=None):
    # initialize list to store activations
    res_stream = nnsight.list().save() 
    res_stream_text = nnsight.list().save()
    res_stream_image = nnsight.list().save()
    
    # loop over steps, can use layer.all() to extract for all steps
    with layers.iter[0:1]:
        # 24th layer output residual stream for text and image stream
        res_stream.append(layers[24].output)
        # 25th layer input for text stream (to check if same as above)
        res_stream_text.append(layers[25].norm1_context.input)
        # 25th layer input for image stream (to check if same as above)
        res_stream_image.append(layers[25].norm1.input)

ericluo04 avatar May 07 '25 19:05 ericluo04