Kandinsky-3
Kandinsky-3 copied to clipboard
MoVQ implementation question
I have some questions regarding the implementation of MoVQ and would appreciate your clarification.
From the original MoVQ paper, it is mentioned that a multi-channle VQ is adopted.
However, the implementation of kandinsky3 does not involve any vector quantization operation:
class MoVQ(nn.Module):
def __init__(self, generator_params):
super().__init__()
z_channels = generator_params["z_channels"]
self.encoder = Encoder(**generator_params)
self.quant_conv = torch.nn.Conv2d(z_channels, z_channels, 1)
self.post_quant_conv = torch.nn.Conv2d(z_channels, z_channels, 1)
self.decoder = Decoder(zq_ch=z_channels, **generator_params)
@torch.no_grad()
def encode(self, x):
h = self.encoder(x)
h = self.quant_conv(h)
return h
@torch.no_grad()
def decode(self, quant):
decoder_input = self.post_quant_conv(quant)
decoded = self.decoder(decoder_input, quant)
return decoded
May I ask if it is a misunderstanding on my part regarding MoVQ, or if Kandinsky has made some modifications to the implementation of MoVQ?