Chinese-CLIP 请问下怎么将该模型作为stable diffusion的text encoder进行微调训练呢？

请问下怎么将该模型作为stable diffusion的text encoder进行微调训练呢？

Open qwert233 opened this issue 1 year ago • 2 comments

代码中textencoder的大概调用是这样的，是否有调用错误？使用的transformers版本为4.25.1

text_cn_encoder = ChineseCLIPTextModel.from_pretrained(args.pretrained_text_model_path)
  tokenizer_cn = BertTokenizer.from_pretrained(args.pretrained_text_model_path)
  inputs_cn = tokenizer_cn(text=captions, truncation=True, return_tensors="pt")
  encoder_hidden_states = text_cn_encoder(inputs_cn.input_ids).last_hidden_state

Sep 04 '23 02:09 qwert233

from PIL import Image
from transformers import ChineseCLIPProcessor, ChineseCLIPModel
from diffusers import StableDiffusionXLPipeline
import torch
from pathlib import Path


device = "cuda"

sdxl_model_path = "stable-diffusion-xl-base-1.0"
clip_model_root = "chinese-clip-vit-large-patch14"

clip_text_model = ChineseCLIPModel.from_pretrained(clip_model_root)
processor = ChineseCLIPProcessor.from_pretrained(clip_model_root)
clip_tokenizer = processor.tokenizer
clip_tokenizer.model_max_length = 77  # TODO The 'padding' parameter of the tokenizer has been set to 'max_length' in StableDiffusionXLPipeline. Therefore, we need to explicitly specify 'model_max_length=77,' referencing CLIP. 

sdxl_pipe = StableDiffusionXLPipeline.from_pretrained(sdxl_model_path)

sdxl_pipe.text_encoder = clip_text_model
sdxl_pipe.tokenizer = clip_tokenizer

sdxl_pipe.to(device)

prompt = ""
image = sdxl_pipe(prompt, num_inference_steps=50, padding=False).images[0]
image.save("img1.jpg")

我是这么写的. 虽然有瑕疵, 但是可以使用.

Feb 05 '24 09:02 HiddenMarkovModel

想问一下，为什么你引用的是ChineseCLIPModel。而不是ChineseCLIPTextModel呢@HiddenMarkovModel

Jun 25 '24 16:06 ranck626

Chinese-CLIP Chinese-CLIP copied to clipboard

请问下怎么将该模型作为stable diffusion的text encoder进行微调训练呢？

Chinese-CLIP
Chinese-CLIP copied to clipboard