DiffPortrait3D
DiffPortrait3D copied to clipboard
I am curious about how the Reference Net works.
Thank you for sharing your amazing code.
I have a question and would like to leave it here.
I was curious about how the Appearance Ref works.
So, I looked into how the "image_control" in the condition dictionary (represented as variable c in the code) works.
# inference.py
for i in range(conditions.shape[0] // nSample):
print("Generate Image {} in {} images".format(nSample * i, conditions.shape[0]))
inpaint = None
if args.denoise_from_fea_map:
fea_map_enc = infer_model.get_first_stage_encoding(infer_model.encode_first_stage(fea_condtion[i*nSample: i*nSample+nSample]))
c = {"c_concat": [conditions[i*nSample: i*nSample+nSample]], "c_crossattn": [c_cross], "image_control": cond_img_cat, 'feature_control':fea_map_enc}
if args.control_mode == "controlnet_important":
uc = {"c_concat": [conditions[i*nSample: i*nSample+nSample]], "c_crossattn": [uc_cross]}
else:
uc = {"c_concat": [conditions[i*nSample: i*nSample+nSample]], "c_crossattn": [uc_cross], "image_control": cond_img_cat}
c['wonoise'] = True
uc['wonoise'] = True
At this point, I discovered that in the function p_sample_ddim of the class DDIMSampler_ReferenceOnly, cond_image_start is concatenated with the timestep to become reference_image_noisy.
def p_sample_ddim(
...
if 'image_control' in c and c['image_control'] is not None:
cond_image_start = torch.cat(c['image_control'], 1)
# cond_image_start = self.model.get_first_stage_encoding(self.model.encode_first_stage(cond_image_hint))
if c['wonoise']:
reference_image_noisy = cond_image_start
else:
reference_image_noisy = self.model.q_sample(cond_image_start,t)
...
model_uncond = self.model.apply_model(x_in, t_in, c_in, None, uc=True)
This reference_image_noisy is also an input to the function apply_model of the class LatentDiffusionReferenceOnly.
However, looking at the code, it seems that reference_image_noisy is not being utilized.
def apply_model(self, x_noisy, t, cond, reference_image_noisy=None ,return_ids=False):
if isinstance(cond, dict):
# hybrid case, cond is expected to be a dict
pass
else:
if not isinstance(cond, list):
cond = [cond]
key = 'c_concat' if self.model.conditioning_key == 'concat' else 'c_crossattn'
cond = {key: cond}
x_recon = self.model(x_noisy, t, **cond)
if isinstance(x_recon, tuple) and not return_ids:
return x_recon[0]
else:
return x_recon
I am curious about how reference_image_noisy serves the role of an appearance reference.