An error occurred during SCEdit inference

Open twinkleyang1 opened this issue 6 months ago • 0 comments

Hello author, after training SCEdit, I made an error when inference with my own weights (using the guidance of segmentation) : 1. The inferred image is completely different from the trained image. 2. Even if the input image is different without changing the seed, the result is exactly the same. The inference code is as follows: CUDA_VISIBLE_DEVICES=1 python /data/twinkle/app/scepter/scepter/tools/run_inference.py --cfg /data/twinkle/app/scepter/scepter/methods/scedit/ctr/sd15_512_sce_ctr_segmentation_Accusyn_Blur.yaml --num_samples 1 --prompt 'Convert to a segmentation map based on the prompt: disk and cup segmentation map' --save_folder /data/twinkle/app/scepter/Newpaper_Accusyn_Blur_1 --image_size 512 --pretrained_model /data/twinkle/app/scepter/cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur/checkpoints/ldm_step-100000.pth --image /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test/target/335L1.png --control_mode segmentation --task control --seed 2023

The inferred images: The first one is the segmentation guidance image, and the second one is the inferred image

During training, the inference images stored in eval_probe：

yaml文件如下： ENV: BACKEND: nccl SOLVER: NAME: LatentDiffusionSolver RESUME_FROM: LOAD_MODEL_ONLY: True USE_FSDP: False SHARDING_STRATEGY: USE_AMP: True DTYPE: float16 CHANNELS_LAST: True MAX_STEPS: 100000 MAX_EPOCHS: -1 NUM_FOLDS: 1 ACCU_STEP: 1 EVAL_INTERVAL: 1000 RESCALE_LR: False

WORK_DIR: ./cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur LOG_FILE: std_log.txt

FILE_SYSTEM: NAME: "ModelscopeFs" TEMP_DIR: "./cache/cache_data"

FREEZE: FREEZE_PART: [ "first_stage_model", "cond_stage_model", "model" ] TRAIN_PART: [ "control_blocks" ]

MODEL: NAME: LatentDiffusionSCEControl PARAMETERIZATION: eps TIMESTEPS: 1000 MIN_SNR_GAMMA: ZERO_TERMINAL_SNR: False PRETRAINED_MODEL: ms://AI-ModelScope/[email protected] IGNORE_KEYS: [ ] SCALE_FACTOR: 0.18215 SIZE_FACTOR: 8 DEFAULT_N_PROMPT: SCHEDULE_ARGS: "NAME": "scaled_linear" "BETA_MIN": 0.00085 "BETA_MAX": 0.012 USE_EMA: False # DIFFUSION_MODEL: NAME: DiffusionUNet IN_CHANNELS: 4 OUT_CHANNELS: 4 MODEL_CHANNELS: 320 NUM_HEADS: 8 NUM_RES_BLOCKS: 2 ATTENTION_RESOLUTIONS: [ 4, 2, 1 ] CHANNEL_MULT: [ 1, 2, 4, 4 ] CONV_RESAMPLE: True DIMS: 2 USE_CHECKPOINT: False USE_SCALE_SHIFT_NORM: False RESBLOCK_UPDOWN: False USE_SPATIAL_TRANSFORMER: True TRANSFORMER_DEPTH: 1 CONTEXT_DIM: 768 DISABLE_MIDDLE_SELF_ATTN: False USE_LINEAR_IN_TRANSFORMER: False PRETRAINED_MODEL: IGNORE_KEYS: [] # FIRST_STAGE_MODEL: NAME: AutoencoderKL EMBED_DIM: 4 PRETRAINED_MODEL: IGNORE_KEYS: [] BATCH_SIZE: 4 # ENCODER: NAME: Encoder CH: 128 OUT_CH: 3 NUM_RES_BLOCKS: 2 IN_CHANNELS: 3 ATTN_RESOLUTIONS: [ ] CH_MULT: [ 1, 2, 4, 4 ] Z_CHANNELS: 4 DOUBLE_Z: True DROPOUT: 0.0 RESAMP_WITH_CONV: True # DECODER: NAME: Decoder CH: 128 OUT_CH: 3 NUM_RES_BLOCKS: 2 IN_CHANNELS: 3 ATTN_RESOLUTIONS: [ ] CH_MULT: [ 1, 2, 4, 4 ] Z_CHANNELS: 4 DROPOUT: 0.0 RESAMP_WITH_CONV: True GIVE_PRE_END: False TANH_OUT: False # TOKENIZER: NAME: ClipTokenizer PRETRAINED_PATH: ms://AI-ModelScope/clip-vit-large-patch14 LENGTH: 77 CLEAN: True # COND_STAGE_MODEL: NAME: FrozenCLIPEmbedder FREEZE: True LAYER: last PRETRAINED_MODEL: ms://AI-ModelScope/clip-vit-large-patch14 # LOSS: NAME: ReconstructLoss LOSS_TYPE: l2 # CONTROL_MODEL: NAME: CSCTuners PRE_HINT_IN_CHANNELS: 3 PRE_HINT_OUT_CHANNELS: 256 DENSE_HINT_KERNAL: 3 SCALE: 1.0 SC_TUNER_CFG: NAME: SCTuner TUNER_NAME: SCEAdapter DOWN_RATIO: 1.0 CONTROL_ANNO: NAME: SegmentationAnnotator UNET_WEIGHT: /data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth SEGMENTATION_PATH: /data/twinkle/app/scepter/Accusyn_segmentation/Blur/

SAMPLE_ARGS: SAMPLER: ddim SAMPLE_STEPS: 50 SEED: 2023 GUIDE_SCALE: 7.5 GUIDE_RESCALE: 0.5 DISCRETIZATION: trailing IMAGE_SIZE: [512, 512] RUN_TRAIN_N: False

OPTIMIZER: NAME: AdamW LEARNING_RATE: 0.0001 BETAS: [ 0.9, 0.999 ] EPS: 1e-8 WEIGHT_DECAY: 1e-2 AMSGRAD: False

TRAIN_DATA: NAME: ImageTextPairMSDataset MODE: train MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train MS_DATASET_NAMESPACE: "" MS_DATASET_SPLIT: train MS_DATASET_SUBNAME: "" MS_REMAP_KEYS: null PROMPT_PREFIX: "" MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train

REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 4
NUM_WORKERS: 4
SAMPLER:
  NAME: LoopSampler
TRANSFORMS:
  - NAME: LoadImageFromFile
    RGB_ORDER: RGB
    BACKEND: pillow
  - NAME: Resize
    SIZE: 512
    INTERPOLATION: bilinear
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: CenterCrop
    SIZE: 512
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: ToNumpy
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'image_preprocess' ]
  - NAME: ImageToTensor
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: Normalize
    MEAN: [ 0.5,  0.5,  0.5 ]
    STD: [ 0.5,  0.5,  0.5 ]
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: torchvision
  - NAME: Rename
    INPUT_KEY: [ 'img', 'image_preprocess' ]
    OUTPUT_KEY: [ 'image', 'image_preprocess' ]
  - NAME: Select
    KEYS: [ 'image', 'prompt', 'image_preprocess' ]
    META_KEYS: [ 'data_key' ]

EVAL_DATA: NAME: ImageTextPairMSDataset MODE: eval # MS_DATASET_NAME: style_custom_dataset # MS_DATASET_NAMESPACE: damo # MS_DATASET_SUBNAME: 3D # PROMPT_PREFIX: "" # MS_DATASET_SPLIT: train_short # MS_REMAP_KEYS: { 'Image:FILE': 'Target:FILE' }

MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train  
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
PROMPT_PREFIX: ""

REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 10
NUM_WORKERS: 4
TRANSFORMS:
  - NAME: LoadImageFromFile
    RGB_ORDER: RGB
    BACKEND: pillow
  - NAME: Resize
    SIZE: 512
    INTERPOLATION: bilinear
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: CenterCrop
    SIZE: 512
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: ToNumpy
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'image_preprocess' ]
  - NAME: ImageToTensor
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: Normalize
    MEAN: [ 0.5,  0.5,  0.5 ]
    STD: [ 0.5,  0.5,  0.5 ]
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: torchvision
  - NAME: Rename
    INPUT_KEY: [ 'img', 'image_preprocess' ]
    OUTPUT_KEY: [ 'image', 'image_preprocess' ]
  - NAME: Select
    KEYS: [ 'image', 'prompt', 'image_preprocess' ]
    META_KEYS: [ 'data_key' ]

TRAIN_HOOKS: - NAME: BackwardHook PRIORITY: 0 - NAME: LogHook LOG_INTERVAL: 50 - NAME: CheckpointHook INTERVAL: 1000 - NAME: ProbeDataHook PROB_INTERVAL: 1000

EVAL_HOOKS: - NAME: ProbeDataHook PROB_INTERVAL: 1000

The segmetation code implemented by myself: import os import torch import numpy as np from PIL import Image from skimage import measure from scepter.modules.annotator.registry import ANNOTATORS from scepter.modules.annotator.base_annotator import BaseAnnotator from scepter.modules.utils.config import dict_to_yaml import sys sys.path.append('/data/twinkle/anaconda3/envs/scepter/lib/python3.8/site-packages/scepter/modules/annotator/unet_model')

from unet_model.unet_model import UNet

from unet_model import UNet

from mmdet.apis import inference_detector, init_detector from mmengine import Config import cv2 import time import concurrent.futures

i = 0 注册 SegmentationAnnotator 到 ANNOTATORS @ANNOTATORS.register_class() class SegmentationAnnotator(BaseAnnotator): para_dict = {}

def __init__(self, cfg, logger=None):
    super().__init__(cfg, logger=logger)
    self.unet_weight = cfg.get('UNET_WEIGHT', '/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth')
    self.segmentation_path = cfg.get('SEGMENTATION_PATH', '/data/twinkle/app/scepter/Accusyn_segmentation/test/')
   

def forward(self, image):
    # print(f"进入到隐藏的SegmentationAnnotator")

    global i
    i += 1
    # print(f'第{i}次')
    # 确保图像为 numpy 数组
    if isinstance(image, Image.Image):
        image = np.array(image)
    elif isinstance(image, torch.Tensor):
        image = image.detach().cpu().numpy()
    elif isinstance(image, np.ndarray):
        image = image.copy()
    else:
        raise ValueError(f'Unsupported data type {type(image)}, only supports np.ndarray, torch.Tensor, Pillow Image.')
    
    # print(f"进入SegmentationAnnotator")
    # Load and initialize UNet model for segmentation:
    with torch.no_grad():
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        unet = UNet(n_channels=3, n_classes=3).to(device)
        unet.load_state_dict(torch.load(os.path.join(self.unet_weight), map_location=device))
        # unet.load_state_dict(torch.load('/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth', map_location=device))
        unet.eval()

     # 将输入图像保存为图片
    input_image = Image.fromarray(image.astype(np.uint8))  # 转换为PIL Image

    basepath = self.segmentation_path

    # 创建输出文件夹，如果不存在
    # print(os.path.join(basepath, "input"))
    if not os.path.exists(os.path.join(basepath, "input")):
        os.makedirs(os.path.join(basepath, "input"))
    input_image.save(os.path.join(basepath, "input/input_image.png"))  # 保存输入图像

    img = cv2.imread(os.path.join(basepath, "input/input_image.png"))
    # 转为tensor
    img_tensor = torch.from_numpy(img)
    # 将tensor拷贝到device中，只用cpu就是拷贝到cpu中，用cuda就是拷贝到cuda中。
    img_tensor = img_tensor.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"), dtype=torch.float32)
    img_tensor = (img_tensor / 127.5) - 1.0
    # print(f"img_tensor: {img_tensor}")
    # print(f"img_tensor_max: {torch.max(img_tensor)}")
    # print(f"img_tensor_min: {torch.min(img_tensor)}")
    # 预测
    # print(f"img_tensor.shape: {img_tensor.shape}")
    img_tensor = img_tensor.unsqueeze(0)  # 在第0维添加一个批次维度
    img_tensor = img_tensor.permute(0, 3, 1, 2)  # 转换张量维度
    # print(f"img_tensor.shape转换后: {img_tensor.shape}")
    pred_unet = unet(img_tensor)
    # print(f"pred_unet.shape: {pred_unet.shape}")
    # print(f"pred_unet_max: {torch.max(pred_unet)}")
    # print(f"pred_unet_min: {torch.min(pred_unet)}")
    # print(pred_unet)
    pred = torch.argmax(pred_unet, dim=1).squeeze(0).cpu().numpy()  # 获取每个像素的类别索引

    pred_resized = cv2.resize(pred, (512, 512), interpolation=cv2.INTER_NEAREST)
    pred_resized = (pred_resized * 255 / 2).astype(np.uint8)
    # print(f"pred_resized.shape: {pred_resized.shape}")

    # # 打印每个像素的类别分布（用于调试）
    # unique_classes, counts = np.unique(pred_resized, return_counts=True)
    # for cls, count in zip(unique_classes, counts):
    #     print(f"Class {cls}: {count} pixels")

    # # 打印每个像素的类别分布（用于调试）
    # unique_classes, counts = np.unique(pred, return_counts=True)
    # for cls, count in zip(unique_classes, counts):
    #     print(f"Class {cls}: {count} pixels")
    
    # 创建颜色映射
    color_map = {
        0: [0, 0, 0],      # 背景 - 黑色
        2: [255, 0, 0],    # cup - 红色
        1: [0, 0, 255]     # disk - 蓝色
    }

    # 遍历输入文件夹中的所有图片

    image_array = pred
    # print(pred)

    # 创建一个新的彩色图像数组
    colored_image = np.zeros((512, 512, 3), dtype=np.uint8)

    # 将每个像素的值映射到颜色上
    for label, color in color_map.items():
        colored_image[image_array == label] = color

    # print(colored_image.shape)
    colored_image_save = Image.fromarray(colored_image)  # 将 numpy 数组转换为图片
    if not os.path.exists(os.path.join(basepath, "output")):
        os.makedirs(os.path.join(basepath, "output"))
    colored_image_save.save(os.path.join(basepath, "output/output_image.png"))  # 保存图片并随着i增加

    # print(f"结束")
    return colored_image

def save_result(self, result, save_path):
    # 确保 result 是三维的 (H, W, C) 格式
    if result.shape != (512, 512, 3):
        raise ValueError(f"Expected result shape (512, 512, 3), but got {result.shape}")

    # 使用 PIL 将结果保存为图片
    image = Image.fromarray(result.astype(np.uint8))  # 将 numpy 数组转换为图片
    image.save(save_path)  # 保存图片
    print(f"Result saved to {save_path}")

@staticmethod
def get_config_template():
    return dict_to_yaml('ANNOTATORS', __class__.__name__, SegmentationAnnotator.para_dict, set_name=True)

Modifications in utils：

Jun 17 '25 03:06 twinkleyang1