diffusers
diffusers copied to clipboard
Интеграция ControlNet, IPAdapter в pipe StableDiffusionControlNetImg2ImgPipeline
There are such introductory data: take the basic module (https://civitai.com/models/4384 ?modelVersionId=128713), this is the control network (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ip adapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). This time we decided to help you create a masterpiece, a portrait of a man, in anime style, high quality, raw photo, 8k uhd.“ Here is the code: import requests from diffusers import ControlNetModel, StableDiffusionControlNetImg2ImgPipeline, StableDiffusionPipeline import torch from settings import ip_adapter_path, control_net_model, user_photo_url, prompt, ip_adapter_dir, base_model
from PIL import Image import torchvision.transforms as transforms
We indicate that we use the CPU to load the model
device = torch.device("cpu")
Downloading ControlNet
control_net = ControlNetModel.from_pretrained(control_net_model)
Downloading IP Adapter
It is assumed that ip_adapter_model is a file with pre-trained weights (in the format .bin)
ip_adapter_weights = torch.load(ip_adapter_path, map_location=device)
Loading the model directly from the .safetensor file
pipeline = StableDiffusionPipeline.from_single_file(base_model)
We use the appropriate pipeline to load the model from the .safetensor file
pipeline = StableDiffusionControlNetImg2ImgPipeline.from_single_file( base_model, # Specify the path to the model file controlnet=control_net, revision=None, # We do not use revisions for local download torch_dtype=torch.float32 # Specify the data type, if necessary ).to(device)
#Installing the IP Adapter scales in the pipeline pipeline.load_ip_adapter( pretrained_model_name_or_path_or_dict=ip_adapter_dir, subfolder="", weight_name="ip-adapter-plus-face_sd15", image_encoder_folder=None # Specify None in order not to load the image encoder. )
#Now that you have a pipeline, you can use it to transform a user's photo.
Let's say that init_image
is the tensor of the user's image that you want to transform
Checking that the image is uploaded, and not the link to the site
response = requests.get(user_photo_url, verify=False) if response.status_code == 200: with open('image/test_image.jpg', 'wb') as f: f.write(response.content)
# We download the image from the disk and convert it to the format that is accepted by pipeline
user_photo = Image.open('image/test_image.jpg').convert("RGB")
transform = transforms.Compose([
transforms.Resize((512, 512)), #Change the size according to your needs
transforms.ToTensor(), #Conversion to the Pwtorch tensor
])
init_image = transform(user_photo).unsqueeze(0) # Adding a batch dimension
if init_image is None:
raise ValueError("The image conversion was not performed.")
print(type(init_image)) # должно быть <class 'torch.Tensor'>
print(init_image is not None) # должно быть True
# Image generation in img2img mode
generated_image = pipeline(
prompt=prompt, # A text description to generate
image=init_image, # The user's start image
num_inference_steps=50, # Number of steps
guidance_scale=7.5, # The impact of a textual description
generator=torch.manual_seed(0), # Grain for reproducibility
)
# Saving the result
generated_image.images[0].save("result.png")
else: print(f"Error loading the image: {response.status_code}"), but this code does not work because it is not possible to integrate ControlNet, IPAdapter into StableDiffusionControlNetImg2ImgPipeline.from_single_file, so that our input data is processed correctly. Thanks for the help.
You should translate your question to English, and even with google translate I don't understand the issue you have.
Integration ControlNet, IPAdapter в pipe StableDiffusionControlNetImg2ImgPipeline There are such input data: take this base model (https://civitai.com/models/4384?modelVersionId=128713), this controlnet (https://huggingface.co/llyasviel/sd-controlnet-depth), ipadapter “ip-adapter-plus-face_sd15” (https ://huggingface.co/h94/IP-Adapter/tree/main/models). At the entrance of the model, a photo of the user with a face should be submitted, after which the model in img2img mode should process it with the prompt “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd”.
Good afternoon. Thanks for the response. I have corrected the text to English. Here's the point : You do not need to use these parameters:
- The base model is the main AI model that has already been trained to generate images. We use it as a basis for further generation.
- ControlNet is a tool that helps you control what the output will be. It's like a director on set telling the actors how to move.
- IPAdapter is an addition to the base model that helps you understand what the user looks like.
- Img2Img processing is a method in which the original image is fed into the model and it creates a new image. Prompt – a description of what you want to generate. In fact, a text instruction for the model.
We need to take this base model (https://civitai.com/models/4384 ?modelVersionId=128713), this controlnet (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ipadapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). A photo of the user with a face should be submitted to the model's input, after which the model in img2img mode should process it with the “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd" prompt.
The pipeline itself should be written in the diffusers library from huggingface (see https://huggingface.co/docs/diffusers/index )we need to take this base model (https://civitai.com/models/4384 ?modelVersionId=128713), this controlnet (https://huggingface.co/lllyasviel/sd-controlnet-depth ), ipadapter “ip-adapter-plus-face_sd15" (https://huggingface.co/h94/IP-Adapter/tree/main/models ). A photo of the user with a face should be submitted to the model's input, after which the model in img2img mode should process it with the “masterpiece, portrait of a person, anime style, high quality, RAW photo, 8k uhd" prompt.
The pipeline itself should be written in the diffusers library from huggingface (see https://huggingface.co/docs/diffusers/index ). But I do not fulfill these requirements. If you only work with the base model, the image is transformed and saved, but if you use ControlNet and IPAdapter and Img2Img processing, then nothing comes out with the StableDiffusionControlNetImg2ImgPipeline class. Maybe I'm not creating the pipeline correctly? If you have any suggestions, please advise. Thank you.
HI @Eduard6315, sorry I totally forgot about this issue. Do you still need help or you already found the solution?
Good afternoon. Yes if you don't mind, what can you recommend.
ok, I did some tests, when you say nothing comes out you mean that the images generated are black?
If that's the problem, that's because fine tuned models generate a lot more of NSFW images that get filtered by the safety checker, so you'll need to set it to none.
As an example, I'll use the hugging face hub one:
pipe = StableDiffusionPipeline.from_pretrained(
"Lykon/dreamshaper-8", torch_dtype=torch.float16, variant="fp16", safety_checker=None
).to("cuda")
other than that I didn't have any issues:
source image | generated |
---|---|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.