stable-diffusion-webui Textual Inversion Improvements

Textual Inversion Improvements

Open d8ahazard opened this issue 1 year ago • 5 comments

This adds two new improvements to TI and/or Preprocessing:

Allow creation of TI Embeddings while using the --medvram flag by loading ckpt to shared.device and unloading again once embedding is crated.
Allow optional use of a customized version of the "cropimage" python library from https://github.com/haofanwang/cropimage

Customized, meaning it won't break SD if any issues arise from bad input images, I've fixed the package to include missing XML files, and modified the main Cropper() class so that it will accept a PIL image as an input, versus just a file name.

Oct 25 '22 18:10 d8ahazard

Sample comparison of "smart crop" versus regular. (Source images have been downscaled to fit)

Oct 25 '22 19:10 d8ahazard

This is good stuff, do you think a anti-crop function can be implemented that resizes the input image to a 1:1 aspect but adds boarders to the sides or top if it doesn't meet 1:1? Through my personal experimentation I learned that using images that were trained in this crop format come out way better then the usual crops. You do get image generation that has borders on the bottom or top in the resulting generations BUT those images are way more cohesive and well structured then the current way of doing crop. Small price to pay, to fic so many issues regarding extra limbs, heads, etc.... Also the holy grail of crop would be crop in the way I describe auto select resulting borders and fill them with inpainting. this would results in a 1:1 cropped image that is cohesive and no borders.

Oct 25 '22 21:10 DoctorDerp

This is good stuff, do you think a anti-crop function can be implemented that resizes the input image to a 1:1 aspect but adds boarders to the sides or top if it doesn't meet 1:1? Through my personal experimentation I learned that using images that were trained in this crop format come out way better then the usual crops. You do get image generation that has borders on the bottom or top in the resulting generations BUT those images are way more cohesive and well structured then the current way of doing crop. Small price to pay, to fic so many issues regarding extra limbs, heads, etc.... Also the holy grail of crop would be crop in the way I describe auto select resulting borders and fill them with inpainting. this would results in a 1:1 cropped image that is cohesive and no borders.

That would be a separate PR for sure, but also probably totally doable. I'd envision two separate preprocess options - one called "resize and fill" which scales the image by the largest side to the desired dimensions, then pads it from the center with black.

The other, slightly more complicated version, would be something like "smart fill", which, as you said, would resize the largest side to the desired dimensions, and then outpaints the rest to reach the desired dimensions. It'd be slower for sure, but almost certainly better for training.

Good idea, I'll see what I can do.

Oct 25 '22 21:10 d8ahazard

Hmmm...maybe don't merge this.

Looking at using something like this:

https://github.com/Vishnunkumar/clipcrop/blob/main/clipcrop/clipcrop.py

Interrogate image - find subject - crop?

Yass.

Oct 25 '22 22:10 d8ahazard

Do we want to be putting more operations of variable 'correctness' (I'd have cropped for the sphinx in the centre of the frame!) into preprocess when there's a world of image slicing and dicing and content detection tools out there that can be run independently and have their outputs validated before throwing them into preprocess blip captioning?

Particularly when received wisdom is that a handful of images work well for everything but aesthetic embedding - although I do find myself throwing loads in other trainings anyway.

Oct 26 '22 00:10 dfaker

Do we want to be putting more operations of variable 'correctness' (I'd have cropped for the sphinx in the centre of the frame!) into preprocess when there's a world of image slicing and dicing and content detection tools out there that can be run independently and have their outputs validated before throwing them into preprocess blip captioning?

Particularly when received wisdom is that a handful of images work well for everything but aesthetic embedding - although I do find myself throwing loads in other trainings anyway.

LOL, see my new pull request. You'll either love me or hate me.

Oct 27 '22 02:10 d8ahazard

closed due to re-submitting as #3762

Oct 27 '22 02:10 d8ahazard

@d8ahazard . I had a chance to play with dreambooths model training recently and I tested my upcrop method on 170 uncropped images mixed in with another 100 anticroopped (white boarders on sides) cropped images. i am happy to report that the genrated images resulted with very few white boarders on the sides. I am talking maybe 1 out of 1k images or more. reason I think that happens is because i find the model simply blends the white borders with the non cropped training images and therefore none are visible. if you are able to implement this anticrop white boarder solution, this would make preprocessing images a lot better and result with no mutated 3 headed freaks that are missing limbs and have extra arms. Currently to do this I have to use a photoshop action so this doesnt impact me but for the general public it should be made the default crop way IMO. Thanks.

Oct 27 '22 22:10 DoctorDerp

stable-diffusion-webui stable-diffusion-webui copied to clipboard

Textual Inversion Improvements

stable-diffusion-webui
stable-diffusion-webui copied to clipboard