Rerender_A_Video icon indicating copy to clipboard operation
Rerender_A_Video copied to clipboard

DDIM sampling do not use x0_strength, and always iterate for ddim_steps=20, time consuming.

Open Sutongtong233 opened this issue 8 months ago • 1 comments

https://github.com/williamyang1991/Rerender_A_Video/blob/dfaf9d8825f226a2f0a0b731ab2adc84a3f2ebd2/src/ddim_v_hacked.py#L300 When the x0_strength is small, it will be a long waste of time to reach https://github.com/williamyang1991/Rerender_A_Video/blob/dfaf9d8825f226a2f0a0b731ab2adc84a3f2ebd2/src/ddim_v_hacked.py#L306 Previous blended img is useless, https://github.com/williamyang1991/Rerender_A_Video/blob/dfaf9d8825f226a2f0a0b731ab2adc84a3f2ebd2/src/ddim_v_hacked.py#L308 Maybe we can directly:

for i, step in enumerate(time_range):
    index = total_steps - i - 1
    if strength >= 0 and i == int(total_steps * strength) and x0 is not None:
        ts = torch.full((b, ), step, device=device, dtype=torch.long)
        break
img = self.model.q_sample(x0, ts)

to get xt, and then denoise from this timestep. For controller, we always fetch the last item:

x0 = F.instance_norm(x0) * self.step_store['first_ada'][-1] + self.step_store['first_ada'][-2]

Sutongtong233 avatar Jun 06 '24 10:06 Sutongtong233