diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Интеграция ControlNet, IPAdapter в pipe StableDiffusionControlNetImg2ImgPipeline

Open Eduard6315 opened this issue 10 months ago • 5 comments

There are such introductory data: take the basic module (https://civitai.com/models/4384 ?modelVersionId=128713), this is the control network (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ip adapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). This time we decided to help you create a masterpiece, a portrait of a man, in anime style, high quality, raw photo, 8k uhd.“ Here is the code: import requests from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline, StableDiffusionPipeline import torch from settings import ip_adapter_path, control_net_model, user_photo_url, prompt, ip_adapter_dir, base_model

from PIL import Image import torchvision.transforms as transforms

We indicate that we use the CPU to load the model

device = torch.device("cpu")

Downloading ControlNet

control_net = ControlNetModel.from_pretrained(control_net_model)

Downloading IP Adapter

It is assumed that ip_adapter_model is a file with pre-trained weights (in the format .bin)

ip_adapter_weights = torch.load(ip_adapter_path, map_location=device)

Loading the model directly from the .safetensor file

pipeline = StableDiffusionPipeline.from_single_file(base_model)

We use the appropriate pipeline to load the model from the .safetensor file

pipeline = StableDiffusionControlNetImg2ImgPipeline.from_single_file( base_model, # Specify the path to the model file controlnet=control_net, revision=None, # We do not use revisions for local download torch_dtype=torch.float32 # Specify the data type, if necessary ).to(device)

#Installing the IP Adapter scales in the pipeline pipeline.load_ip_adapter( pretrained_model_name_or_path_or_dict=ip_adapter_dir, subfolder="", weight_name="ip-adapter-plus-face_sd15", image_encoder_folder=None # Specify None in order not to load the image encoder. )

#Now that you have a pipeline, you can use it to transform a user's photo.

Let's say that init_image is the tensor of the user's image that you want to transform

Checking that the image is uploaded, and not the link to the site

response = requests.get(user_photo_url, verify=False) if response.status_code == 200: with open('image/test_image.jpg', 'wb') as f: f.write(response.content)

# We download the image from the disk and convert it to the format that is accepted by pipeline
user_photo = Image.open('image/test_image.jpg').convert("RGB")
transform = transforms.Compose([
    transforms.Resize((512, 512)),  #Change the size according to your needs
    transforms.ToTensor(),  #Conversion to the Pwtorch tensor
])
init_image = transform(user_photo).unsqueeze(0)  # Adding a batch dimension
if init_image is None:
    raise ValueError("The image conversion was not performed.")

print(type(init_image))  # должно быть <class 'torch.Tensor'>
print(init_image is not None)  # должно быть True


# Image generation in img2img mode
generated_image = pipeline(
    prompt=prompt,  # A text description to generate
    image=init_image,  # The user's start image
    num_inference_steps=50,  # Number of steps
    guidance_scale=7.5,  # The impact of a textual description
    generator=torch.manual_seed(0),  # Grain for reproducibility
)

# Saving the result
generated_image.images[0].save("result.png")

else: print(f"Error loading the image: {response.status_code}"), but this code does not work because it is not possible to integrate ControlNet, IPAdapter into StableDiffusionControlNetImg2ImgPipeline.from_single_file, so that our input data is processed correctly. Thanks for the help.

Eduard6315 avatar Apr 19 '24 18:04 Eduard6315

You should translate your question to English, and even with google translate I don't understand the issue you have.

Integration ControlNet, IPAdapter в pipe StableDiffusionControlNetImg2ImgPipeline There are such input data: take this base model (https://civitai.com/models/4384?modelVersionId=128713), this controlnet (https://huggingface.co/llyasviel/sd-controlnet-depth), ipadapter “ip-adapter-plus-face_sd15” (https ://huggingface.co/h94/IP-Adapter/tree/main/models). At the entrance of the model, a photo of the user with a face should be submitted, after which the model in img2img mode should process it with the prompt “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd”.

asomoza avatar Apr 19 '24 19:04 asomoza

Good afternoon. Thanks for the response. I have corrected the text to English. Here's the point : You do not need to use these parameters:

  • The base model is the main AI model that has already been trained to generate images. We use it as a basis for further generation.
  • ControlNet is a tool that helps you control what the output will be. It's like a director on set telling the actors how to move.
  • IPAdapter is an addition to the base model that helps you understand what the user looks like.
  • Img2Img processing is a method in which the original image is fed into the model and it creates a new image. Prompt – a description of what you want to generate. In fact, a text instruction for the model.

We need to take this base model (https://civitai.com/models/4384 ?modelVersionId=128713), this controlnet (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ipadapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). A photo of the user with a face should be submitted to the model's input, after which the model in img2img mode should process it with the “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd" prompt.

The pipeline itself should be written in the diffusers library from huggingface (see https://huggingface.co/docs/diffusers/index )we need to take this base model (https://civitai.com/models/4384 ?modelVersionId=128713), this controlnet (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ipadapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). A photo of the user with a face should be submitted to the model's input, after which the model in img2img mode should process it with the “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd" prompt.

The pipeline itself should be written in the diffusers library from huggingface (see https://huggingface.co/docs/diffusers/index ). But I do not fulfill these requirements. If you only work with the base model, the image is transformed and saved, but if you use ControlNet and IPAdapter and Img2Img processing, then nothing comes out with the StableDiffusionControlNetImg2ImgPipeline class. Maybe I'm not creating the pipeline correctly? If you have any suggestions, please advise. Thank you.

Eduard6315 avatar Apr 20 '24 06:04 Eduard6315

HI @Eduard6315, sorry I totally forgot about this issue. Do you still need help or you already found the solution?

asomoza avatar May 08 '24 08:05 asomoza

Good afternoon. Yes if you don't mind, what can you recommend.

Eduard6315 avatar May 08 '24 12:05 Eduard6315

ok, I did some tests, when you say nothing comes out you mean that the images generated are black?

If that's the problem, that's because fine tuned models generate a lot more of NSFW images that get filtered by the safety checker, so you'll need to set it to none.

As an example, I'll use the hugging face hub one:

pipe = StableDiffusionPipeline.from_pretrained(
    "Lykon/dreamshaper-8", torch_dtype=torch.float16, variant="fp16", safety_checker=None
).to("cuda")

other than that I didn't have any issues:

source image generated
woman 20240510030425

asomoza avatar May 10 '24 07:05 asomoza

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]