Kandinsky-3 icon indicating copy to clipboard operation
Kandinsky-3 copied to clipboard

MoVQ implementation question

Open kebijuelun opened this issue 1 year ago • 2 comments

I have some questions regarding the implementation of MoVQ and would appreciate your clarification.

From the original MoVQ paper, it is mentioned that a multi-channle VQ is adopted. image

However, the implementation of kandinsky3 does not involve any vector quantization operation:

class MoVQ(nn.Module):
    
    def __init__(self, generator_params):
        super().__init__()
        z_channels = generator_params["z_channels"]
        self.encoder = Encoder(**generator_params)
        self.quant_conv = torch.nn.Conv2d(z_channels, z_channels, 1)
        self.post_quant_conv = torch.nn.Conv2d(z_channels, z_channels, 1)
        self.decoder = Decoder(zq_ch=z_channels, **generator_params)
        
    @torch.no_grad()
    def encode(self, x):
        h = self.encoder(x)
        h = self.quant_conv(h)
        return h

    @torch.no_grad()
    def decode(self, quant):
        decoder_input = self.post_quant_conv(quant)
        decoded = self.decoder(decoder_input, quant)
        return decoded

May I ask if it is a misunderstanding on my part regarding MoVQ, or if Kandinsky has made some modifications to the implementation of MoVQ?

kebijuelun avatar Dec 15 '23 04:12 kebijuelun