audio-diffusion-pytorch
audio-diffusion-pytorch copied to clipboard
Add support to clip predicted samples to the desired range.
In diffusion it is common to want to clip samples to a desired range like [-1, 1]
, I think previous versions of this package supported this. However, the current implementation does not support this.
I think it would be useful to support clipping samples to a desired range.
VSampler
def forward(..., clip_denoised: bool = False, dynamic_threshold: float = 0.0) -> Tensor:
...
x_pred = alphas[i] * x_noisy - betas[i] * v_pred
# Add clipping support here
if clip_denoised:
clip(x_pred, dynamic_threshold=dynamic_threshold)
...
I am happy to open a PR if this is acceptable.
Hey Kinyugo! Looks good to me. Only things is that dynamic thresholding is usually applied inside the sampling loop not only at the end. So a simple x_pred.clamp(-1,1) is probably enough -- I didn't transfer dynamic thresholding to v-diff since I'm not sure it would play well inside the sampling loop as we're not only predicting the ground truth like with normal or k-diff.
Hello Flavio. It makes sense not have dynamic thresholding. Have you experimented with the effects of clipping on the final sample quality?