stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Add Metadata/Parameters Imprinting to Images 🚂💨🎵

Open kjerk opened this issue 1 year ago • 8 comments

New Features

  • Adds settings for imprinting metadata onto saved image pixels (works for PNG only). Only the last 8 rows.
  • Adds decoding capability for the PNG Info tab to decode this information
  • Adds an optional password field that will encrypt the imprinted metadata

Summary

  • This pull request adds options to enable the embedding of image generation parameters directly onto the image pixels (PNG only). This functionality ensures that the image file serves as a standalone, self-contained unit, preserving params and other data that helped create it. This enhancement counteracts the common practice of stripping metadata from image file formats by online hosts and websites. This solution is transparent to the end user, with no increase in file size. As long as the file is a PNG, or converted to other lossless formats, the embedded data will remain intact.
  • This change has 0 perceptual impact and is carefully constrained to a bottom slice of the image.

Examples / UI Changes

Options

20230410_165819_TviRAXuK9X_option1 20230410_170038_o0JWWq1bZG_option2

PNG Info Tab

20230410_181704_chrome_y9hJ8Can5D

https://user-images.githubusercontent.com/2738686/231030398-4067b276-93fb-4e47-b2e1-cab9340d29a7.mp4

Discussion topics

  • Figuring it would be a highly requested immediately, I added the ability to password protect imprinted information, but would like to discourage use of the feature. Whatever UX is a good enough middle ground of totally allowing this for those who seek it out but defaulting users to unpassworded I think would be favorable to the community.
  • Possible later option to extend the image down a couple rows, adding a visible pixel bar (that would look like noise) but can be much more densely packed information. This is less aesthetically pleasing but gives a visual indicator that this image is tagged.

Background

Many online platforms, apps (Discord, Slack), and websites remove metadata from image files to address security concerns. While this practice is very sensible and important, but it results in the loss of essential generation data which the community is built around. The proposed feature seeks to preserve this data by embedding it into the image itself and therefor become resilient to transmission medium.

Design Considerations

Why only 8 rows?

  • In order to try to leave the image as unaltered as possible, altering only the last 8 rows of pixels is a happy medium that pushes the data toward the end, but still leaves the image mostly intact. The perceptual difference is still zero, but this way we only affect ~1.5% of the pixels in the image at absolute maximum.
  • With the gzip compression in place, 3.6KB of text can compress down to 928 bytes, comfortably fitting in the 1.5KB space limit. Only the absolute largest absurd prompts have any chance of running over this.
  • This keeps slightly in alignment with jpeg compression (which uses 8x8 blocks) for future endeavors, possibly working across that block size with a jpeg resilient method at a later time.

Why 0x0CAB005E?

  • It's cute. Needed to detect the end of the envelope. 🚂💨🎵

Why gz?

  • This is the clear winner for small strings, even testing experimental compression like bz3 image

New Dependencies

  • pycryptodome for AES encryption/decryption
  • This dependency is wrapped in a try/except block, so if it's not installed, the feature will not be available. If the user has a password set and imprint enabled, it will throw a descriptive error.

Changelist

This pull request introduces the following key changes:

  1. Imprinting Functionality: The new feature integrates the ability to encode image generation data into the image pixels themselves, preserving the metadata throughout the file's transport and storage.
  2. Robustness: This method ensures that even if an image file is stripped of its metadata, the embedded information will still be present within the image, even surviving conversion to other lossless image formats (but not jpeg).

Algo

Encoding

  1. The output image is converted to an array of bytes.
  2. The offset to the beginning of the last 8 rows of pixels is calculated and then the array is sliced to include only those rows.
  3. The image parameters are encoded into a byte string, and then compressed using gz, which is efficient for small text strings.
  4. IF there is a password/key set in the user options, the compressed string is encrypted using the sha-256 hashed value of the user's password in AES-256-CBC, with padding.
  5. 0X0CAB005E is appended as a suffix.
  6. For efficiency, rather than loop over pixels, the operation is done in bulk, splitting the data to bits and then blitting the data over top of the existing image data after truncating the pixel view to the right size. So only as many pixels as it takes to fit the information are affected.
  7. The image is reconstituted in place from the byte array without allocating a new image.

Decoding

Steps 1, 2 of Encoding, then

  1. Extract the bits off the last 8 rows of data and re-compact them.
  2. 0X0CAB005E is sought as a suffix. If not found this was not an imprinted image. If found, the array is sliced to that marker.
  3. IF there is a password/key set in the user options, the compressed string is decrypted using this password.
  4. We check for the magic bytes telling us whether the data is gzip compressed or not, if it is, we decompress it.
  5. The byte string is decoded into a utf-8 string and returned.

Code Changes

  • Added image_imprint.py module with util methods. A fork of my work on ArtiFactual.
  • Plumb imprint_override_password through run_pnginfo() so it can be filled on the spot from the UI.
  • Add imprinting to save_image(), reusing the info string.
  • Add imprint reading to read_info_from_image(), trying to pull both metadata and the imprinted data, if they are the same anyway, only return the metadata.
  • Add enable_imprinting and imprinting_password options to shared.py
  • Add new imprint_override_password textbox to the UI. Hidden if enable_imprinting is turned off.

Example Image

This image from the above example video is imprinted and passworded with cool pass bro. You can download it and see that there is no metadata in the image. Running this branch, you can decode it as in the example video.

Tested on

  • Windows 10
  • Chrome 103
  • Python 3.10.8 / Torch 2.0.0+cu118 / CUDA 11.8 / cuDNN 8.8.1 / RTX 4090
  • RTX 4090

kjerk avatar Apr 11 '23 01:04 kjerk

ther was an Idea of encoding the info int the png alpha layer the see https://github.com/ashen-sensored/sd_webui_stealth_pnginfo

w-e-w avatar Apr 11 '23 06:04 w-e-w

That is an interesting idea and would be similarly transparent (no pun intended) (edit: and provide more space, the data hoarder in me likes that 👀 ), but I feel like it's just as vulnerable as the metadata to alpha stripping, data cleaning, 'lossless optimization', and so on.

Unfortunately if you search this project, and many other python repositories the instances you will see of image.convert('RGB') is extremely prevalent, which immediately destroys the alpha channel. That's why I felt targeting the RGB channels was the only robust way.

kjerk avatar Apr 12 '23 00:04 kjerk

I like this approach the best, personally. Would have the maximum possible compatibility with all sites since it's literally just pixel data (and if that's not being preserved, we would have no hope at all of maintaining metadata).

MrCheeze avatar Apr 12 '23 02:04 MrCheeze

as far as I understand this is for all intents and purposes a invisible watermark and some people really don't like these sort of things

w-e-w avatar Apr 12 '23 03:04 w-e-w

Looking forward to this getting merged, great work 👍

Kilvoctu avatar Apr 12 '23 18:04 Kilvoctu

@kjerk

That's really cool however the fixed size and fixed starting point seems like it could be a bit limiting for future changes to me no?

Have you considered a slight change of putting the 0x0CAB005E at the end, and then having the size before it so it would be like: [data][length of data as 3 bytes][0x0CAB005E][END OF PIXEL DATA] all encoded into the image with your algorithm into the last smallest bits still of course.

So you could still just as easily check if an image is imprinted just by checking the last couple pixels if they have your magic bits endings and if so just read the length from before that and then read that many number of bytes as data from before that. This way you could even let users choose the max number of data they are okay with adding or change it later still without breaking anything or being locked into the fixed starting point.

ulysso avatar Apr 12 '23 21:04 ulysso

@kjerk

That's really cool however the fixed size and fixed starting point seems like it could be a bit limiting for future changes to me no?

Have you considered a slight change of putting the 0x0CAB005E at the end, and then having the size before it so it would be like: [data][length of data as 3 bytes][0x0CAB005E][END OF PIXEL DATA] all encoded into the image with your algorithm into the last smallest bits still of course.

So you could still just as easily check if an image is imprinted just by checking the last couple pixels if they have your magic bits endings and if so just read the length from before that and then read that many number of bytes as data from before that. This way you could even let users choose the max number of data they are okay with adding or change it later still without breaking anything or being locked into the fixed starting point.

So thinking about this I like this approach as a hybrid of the constraints 👍 (effect as little as possible, know the bounds easily), the data is still kept toward the end of the image, and grows backward in reverse reading order basically, but allows the data to expand backward effectively ad infinitum, though I still think 99.8% of the use cases are going to wind up as 2 rows of pixels.

One issue the fixed seek point was trying to overcome is truly bring able to know the bounds of data start to end naively and simply, which makes re-implementation of this pretty easy to understand and you can point to where the data is on an image (if there is any), whereas the suffix approach is more of an RLE implementation with the run length just being backwards, but I think the math works out to make this pleasingly simple.

I'll take a shot at making those changes and update later, thanks for chiming in!

as far as I understand this is for all intents and purposes a invisible watermark and some people really don't like these sort of things

Effectively, but this is done out in the open and users have full control, (optional) fingerprinting is already baked into the majority of these projects just for the AI aspect, and if this is your first encounter with steganography wait until you find out about the yellow dots that HP printers output 'randomly' 😄.

Any hangups like this are unfortunately a bottomless rabbit-hole because either you trust the implementer or not, and if not, then a person shouldn't be running the software anyway, and not even only this repo, anything using Pillow, or libjpeg-turbo, your phone, and so on. You gotta pick your battles on topics like this and this is done with an open implementation, in good faith, in public vision and under public review, and ultimately under user's discretion and control, with really cool benefits out the other side.

I'm also up for discussion on defaults, where something like this has the maximum benefit by a huge margin by being on by default, but power users who are the only ones that would care anyway can disable the options or set passwords. I can also see being turned on by default being seen as an overstep.

kjerk avatar Apr 13 '23 01:04 kjerk

This enhancement counteracts the common practice of stripping metadata from image file formats by online hosts and websites

I'm all for having this merged, but I do not think this should not be enabled by default, which is the case in this PR's current state. Many sites forbid you do this, particularly 4chan, which is where this repo originated and is still widely used.

It would not surprise me if this were merged that other sites take notice and begin to crack down on this, as the entire reason the rule was only added to 4chan in the last couple years is because users started to intentionally distribute illegal material via a userscript that made embedding or imprinting data incredibly easy. Just like how this tech existed quite some time, it wasn't taken much notice until a user-friendly UI was made. The same will likely apply to this as it did to that aforementioned userscript, especially if it's the default.

I'm not trying to make a scare but it is a very real issue that can and will be exploited and possibly destroy SD communities on other websites.

catboxanon avatar Apr 13 '23 21:04 catboxanon

Dangling for too long because of movement on other projects. Will update and reopen if updated to meet the aforementioned specs.

kjerk avatar Sep 05 '23 18:09 kjerk