An error occurred during SCEdit inference
Hello author, after training SCEdit, I made an error when inference with my own weights (using the guidance of segmentation) : 1. The inferred image is completely different from the trained image. 2. Even if the input image is different without changing the seed, the result is exactly the same. The inference code is as follows: CUDA_VISIBLE_DEVICES=1 python /data/twinkle/app/scepter/scepter/tools/run_inference.py --cfg /data/twinkle/app/scepter/scepter/methods/scedit/ctr/sd15_512_sce_ctr_segmentation_Accusyn_Blur.yaml --num_samples 1 --prompt 'Convert to a segmentation map based on the prompt: disk and cup segmentation map' --save_folder /data/twinkle/app/scepter/Newpaper_Accusyn_Blur_1 --image_size 512 --pretrained_model /data/twinkle/app/scepter/cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur/checkpoints/ldm_step-100000.pth --image /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test/target/335L1.png --control_mode segmentation --task control --seed 2023
The inferred images: The first one is the segmentation guidance image, and the second one is the inferred image
During training, the inference images stored in eval_probe:
yaml文件如下: ENV: BACKEND: nccl SOLVER: NAME: LatentDiffusionSolver RESUME_FROM: LOAD_MODEL_ONLY: True USE_FSDP: False SHARDING_STRATEGY: USE_AMP: True DTYPE: float16 CHANNELS_LAST: True MAX_STEPS: 100000 MAX_EPOCHS: -1 NUM_FOLDS: 1 ACCU_STEP: 1 EVAL_INTERVAL: 1000 RESCALE_LR: False
WORK_DIR: ./cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur LOG_FILE: std_log.txt
FILE_SYSTEM: NAME: "ModelscopeFs" TEMP_DIR: "./cache/cache_data"
FREEZE: FREEZE_PART: [ "first_stage_model", "cond_stage_model", "model" ] TRAIN_PART: [ "control_blocks" ]
MODEL: NAME: LatentDiffusionSCEControl PARAMETERIZATION: eps TIMESTEPS: 1000 MIN_SNR_GAMMA: ZERO_TERMINAL_SNR: False PRETRAINED_MODEL: ms://AI-ModelScope/[email protected] IGNORE_KEYS: [ ] SCALE_FACTOR: 0.18215 SIZE_FACTOR: 8 DEFAULT_N_PROMPT: SCHEDULE_ARGS: "NAME": "scaled_linear" "BETA_MIN": 0.00085 "BETA_MAX": 0.012 USE_EMA: False # DIFFUSION_MODEL: NAME: DiffusionUNet IN_CHANNELS: 4 OUT_CHANNELS: 4 MODEL_CHANNELS: 320 NUM_HEADS: 8 NUM_RES_BLOCKS: 2 ATTENTION_RESOLUTIONS: [ 4, 2, 1 ] CHANNEL_MULT: [ 1, 2, 4, 4 ] CONV_RESAMPLE: True DIMS: 2 USE_CHECKPOINT: False USE_SCALE_SHIFT_NORM: False RESBLOCK_UPDOWN: False USE_SPATIAL_TRANSFORMER: True TRANSFORMER_DEPTH: 1 CONTEXT_DIM: 768 DISABLE_MIDDLE_SELF_ATTN: False USE_LINEAR_IN_TRANSFORMER: False PRETRAINED_MODEL: IGNORE_KEYS: [] # FIRST_STAGE_MODEL: NAME: AutoencoderKL EMBED_DIM: 4 PRETRAINED_MODEL: IGNORE_KEYS: [] BATCH_SIZE: 4 # ENCODER: NAME: Encoder CH: 128 OUT_CH: 3 NUM_RES_BLOCKS: 2 IN_CHANNELS: 3 ATTN_RESOLUTIONS: [ ] CH_MULT: [ 1, 2, 4, 4 ] Z_CHANNELS: 4 DOUBLE_Z: True DROPOUT: 0.0 RESAMP_WITH_CONV: True # DECODER: NAME: Decoder CH: 128 OUT_CH: 3 NUM_RES_BLOCKS: 2 IN_CHANNELS: 3 ATTN_RESOLUTIONS: [ ] CH_MULT: [ 1, 2, 4, 4 ] Z_CHANNELS: 4 DROPOUT: 0.0 RESAMP_WITH_CONV: True GIVE_PRE_END: False TANH_OUT: False # TOKENIZER: NAME: ClipTokenizer PRETRAINED_PATH: ms://AI-ModelScope/clip-vit-large-patch14 LENGTH: 77 CLEAN: True # COND_STAGE_MODEL: NAME: FrozenCLIPEmbedder FREEZE: True LAYER: last PRETRAINED_MODEL: ms://AI-ModelScope/clip-vit-large-patch14 # LOSS: NAME: ReconstructLoss LOSS_TYPE: l2 # CONTROL_MODEL: NAME: CSCTuners PRE_HINT_IN_CHANNELS: 3 PRE_HINT_OUT_CHANNELS: 256 DENSE_HINT_KERNAL: 3 SCALE: 1.0 SC_TUNER_CFG: NAME: SCTuner TUNER_NAME: SCEAdapter DOWN_RATIO: 1.0 CONTROL_ANNO: NAME: SegmentationAnnotator UNET_WEIGHT: /data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth SEGMENTATION_PATH: /data/twinkle/app/scepter/Accusyn_segmentation/Blur/
SAMPLE_ARGS: SAMPLER: ddim SAMPLE_STEPS: 50 SEED: 2023 GUIDE_SCALE: 7.5 GUIDE_RESCALE: 0.5 DISCRETIZATION: trailing IMAGE_SIZE: [512, 512] RUN_TRAIN_N: False
OPTIMIZER: NAME: AdamW LEARNING_RATE: 0.0001 BETAS: [ 0.9, 0.999 ] EPS: 1e-8 WEIGHT_DECAY: 1e-2 AMSGRAD: False
TRAIN_DATA: NAME: ImageTextPairMSDataset MODE: train MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train MS_DATASET_NAMESPACE: "" MS_DATASET_SPLIT: train MS_DATASET_SUBNAME: "" MS_REMAP_KEYS: null PROMPT_PREFIX: "" MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train
REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 4
NUM_WORKERS: 4
SAMPLER:
NAME: LoopSampler
TRANSFORMS:
- NAME: LoadImageFromFile
RGB_ORDER: RGB
BACKEND: pillow
- NAME: Resize
SIZE: 512
INTERPOLATION: bilinear
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: CenterCrop
SIZE: 512
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: ToNumpy
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'image_preprocess' ]
- NAME: ImageToTensor
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: Normalize
MEAN: [ 0.5, 0.5, 0.5 ]
STD: [ 0.5, 0.5, 0.5 ]
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: torchvision
- NAME: Rename
INPUT_KEY: [ 'img', 'image_preprocess' ]
OUTPUT_KEY: [ 'image', 'image_preprocess' ]
- NAME: Select
KEYS: [ 'image', 'prompt', 'image_preprocess' ]
META_KEYS: [ 'data_key' ]
EVAL_DATA: NAME: ImageTextPairMSDataset MODE: eval # MS_DATASET_NAME: style_custom_dataset # MS_DATASET_NAMESPACE: damo # MS_DATASET_SUBNAME: 3D # PROMPT_PREFIX: "" # MS_DATASET_SPLIT: train_short # MS_REMAP_KEYS: { 'Image:FILE': 'Target:FILE' }
MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
PROMPT_PREFIX: ""
REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 10
NUM_WORKERS: 4
TRANSFORMS:
- NAME: LoadImageFromFile
RGB_ORDER: RGB
BACKEND: pillow
- NAME: Resize
SIZE: 512
INTERPOLATION: bilinear
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: CenterCrop
SIZE: 512
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: ToNumpy
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'image_preprocess' ]
- NAME: ImageToTensor
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: Normalize
MEAN: [ 0.5, 0.5, 0.5 ]
STD: [ 0.5, 0.5, 0.5 ]
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: torchvision
- NAME: Rename
INPUT_KEY: [ 'img', 'image_preprocess' ]
OUTPUT_KEY: [ 'image', 'image_preprocess' ]
- NAME: Select
KEYS: [ 'image', 'prompt', 'image_preprocess' ]
META_KEYS: [ 'data_key' ]
TRAIN_HOOKS: - NAME: BackwardHook PRIORITY: 0 - NAME: LogHook LOG_INTERVAL: 50 - NAME: CheckpointHook INTERVAL: 1000 - NAME: ProbeDataHook PROB_INTERVAL: 1000
EVAL_HOOKS: - NAME: ProbeDataHook PROB_INTERVAL: 1000
The segmetation code implemented by myself: import os import torch import numpy as np from PIL import Image from skimage import measure from scepter.modules.annotator.registry import ANNOTATORS from scepter.modules.annotator.base_annotator import BaseAnnotator from scepter.modules.utils.config import dict_to_yaml import sys sys.path.append('/data/twinkle/anaconda3/envs/scepter/lib/python3.8/site-packages/scepter/modules/annotator/unet_model')
from unet_model.unet_model import UNet
from unet_model import UNet
from mmdet.apis import inference_detector, init_detector from mmengine import Config import cv2 import time import concurrent.futures
i = 0 注册 SegmentationAnnotator 到 ANNOTATORS @ANNOTATORS.register_class() class SegmentationAnnotator(BaseAnnotator): para_dict = {}
def __init__(self, cfg, logger=None):
super().__init__(cfg, logger=logger)
self.unet_weight = cfg.get('UNET_WEIGHT', '/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth')
self.segmentation_path = cfg.get('SEGMENTATION_PATH', '/data/twinkle/app/scepter/Accusyn_segmentation/test/')
def forward(self, image):
# print(f"进入到隐藏的SegmentationAnnotator")
global i
i += 1
# print(f'第{i}次')
# 确保图像为 numpy 数组
if isinstance(image, Image.Image):
image = np.array(image)
elif isinstance(image, torch.Tensor):
image = image.detach().cpu().numpy()
elif isinstance(image, np.ndarray):
image = image.copy()
else:
raise ValueError(f'Unsupported data type {type(image)}, only supports np.ndarray, torch.Tensor, Pillow Image.')
# print(f"进入SegmentationAnnotator")
# Load and initialize UNet model for segmentation:
with torch.no_grad():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
unet = UNet(n_channels=3, n_classes=3).to(device)
unet.load_state_dict(torch.load(os.path.join(self.unet_weight), map_location=device))
# unet.load_state_dict(torch.load('/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth', map_location=device))
unet.eval()
# 将输入图像保存为图片
input_image = Image.fromarray(image.astype(np.uint8)) # 转换为PIL Image
basepath = self.segmentation_path
# 创建输出文件夹,如果不存在
# print(os.path.join(basepath, "input"))
if not os.path.exists(os.path.join(basepath, "input")):
os.makedirs(os.path.join(basepath, "input"))
input_image.save(os.path.join(basepath, "input/input_image.png")) # 保存输入图像
img = cv2.imread(os.path.join(basepath, "input/input_image.png"))
# 转为tensor
img_tensor = torch.from_numpy(img)
# 将tensor拷贝到device中,只用cpu就是拷贝到cpu中,用cuda就是拷贝到cuda中。
img_tensor = img_tensor.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"), dtype=torch.float32)
img_tensor = (img_tensor / 127.5) - 1.0
# print(f"img_tensor: {img_tensor}")
# print(f"img_tensor_max: {torch.max(img_tensor)}")
# print(f"img_tensor_min: {torch.min(img_tensor)}")
# 预测
# print(f"img_tensor.shape: {img_tensor.shape}")
img_tensor = img_tensor.unsqueeze(0) # 在第0维添加一个批次维度
img_tensor = img_tensor.permute(0, 3, 1, 2) # 转换张量维度
# print(f"img_tensor.shape转换后: {img_tensor.shape}")
pred_unet = unet(img_tensor)
# print(f"pred_unet.shape: {pred_unet.shape}")
# print(f"pred_unet_max: {torch.max(pred_unet)}")
# print(f"pred_unet_min: {torch.min(pred_unet)}")
# print(pred_unet)
pred = torch.argmax(pred_unet, dim=1).squeeze(0).cpu().numpy() # 获取每个像素的类别索引
pred_resized = cv2.resize(pred, (512, 512), interpolation=cv2.INTER_NEAREST)
pred_resized = (pred_resized * 255 / 2).astype(np.uint8)
# print(f"pred_resized.shape: {pred_resized.shape}")
# # 打印每个像素的类别分布(用于调试)
# unique_classes, counts = np.unique(pred_resized, return_counts=True)
# for cls, count in zip(unique_classes, counts):
# print(f"Class {cls}: {count} pixels")
# # 打印每个像素的类别分布(用于调试)
# unique_classes, counts = np.unique(pred, return_counts=True)
# for cls, count in zip(unique_classes, counts):
# print(f"Class {cls}: {count} pixels")
# 创建颜色映射
color_map = {
0: [0, 0, 0], # 背景 - 黑色
2: [255, 0, 0], # cup - 红色
1: [0, 0, 255] # disk - 蓝色
}
# 遍历输入文件夹中的所有图片
image_array = pred
# print(pred)
# 创建一个新的彩色图像数组
colored_image = np.zeros((512, 512, 3), dtype=np.uint8)
# 将每个像素的值映射到颜色上
for label, color in color_map.items():
colored_image[image_array == label] = color
# print(colored_image.shape)
colored_image_save = Image.fromarray(colored_image) # 将 numpy 数组转换为图片
if not os.path.exists(os.path.join(basepath, "output")):
os.makedirs(os.path.join(basepath, "output"))
colored_image_save.save(os.path.join(basepath, "output/output_image.png")) # 保存图片并随着i增加
# print(f"结束")
return colored_image
def save_result(self, result, save_path):
# 确保 result 是三维的 (H, W, C) 格式
if result.shape != (512, 512, 3):
raise ValueError(f"Expected result shape (512, 512, 3), but got {result.shape}")
# 使用 PIL 将结果保存为图片
image = Image.fromarray(result.astype(np.uint8)) # 将 numpy 数组转换为图片
image.save(save_path) # 保存图片
print(f"Result saved to {save_path}")
@staticmethod
def get_config_template():
return dict_to_yaml('ANNOTATORS', __class__.__name__, SegmentationAnnotator.para_dict, set_name=True)
Modifications in utils: