MegaPortraits
MegaPortraits copied to clipboard
About shape of latent expression descriptors and global descriptor
Hi, thanks for your awesome work.
I am replicating the structure of this research paper, but I am uncertain about the shape of latent expression descriptors and global descriptor. Can you provide them?
Thanks.
I have the same question. In my view, I think the latent expression and global descriptor are both a 1D vector, but I find the shape of appearance feature vs and the shape of warp ws are not matched. In Fig 9, the appearance encoder downsample 3 times, if the input image is 512x512, you can get a 4-D tensor with shape of 96x16x64x64, but the warping generator only unsample 4 times in height and width, it seems that if the latent expression is a 1-D tensor, the shape of wrap ws must be 3x16x16x16. The height and width are not matched. Im very confused about it, do u have any suggestion?
does this code by @Kevinfringe help? https://github.com/Kevinfringe/MegaPortrait
https://github.com/Kevinfringe/MegaPortrait/blob/85ca3692a0abc3e906e91ed924dc311b4cad538b/model.py#L148
side note - I'm attempting to recreate VASA-1 paper here - https://github.com/johndpope/vasa-1-hack
@xuzheyuan624 / @TuesdayT - I rebuild the MegaPortrait repo from Kevin using Claude Opus
https://github.com/johndpope/MegaPortrait-hack/ From my implementation - it's saying do a 50 dimension expression net vector that aligns to the resnet18
https://github.com/johndpope/MegaPortrait-hack/issues/11
update I have some progress. But without dimensions - havenβt been able to get it to work.
I have the same question. In my view, I think the latent expression and global descriptor are both a 1D vector, but I find the shape of appearance feature vs and the shape of warp ws are not matched. In Fig 9, the appearance encoder downsample 3 times, if the input image is 512x512, you can get a 4-D tensor with shape of 96x16x64x64, but the warping generator only unsample 4 times in height and width, it seems that if the latent expression is a 1-D tensor, the shape of wrap ws must be 3x16x16x16. The height and width are not matched. Im very confused about it, do u have any suggestion?
Found the same question here. I prefer zs/zd are 1d vectors, maybe the authors repeat them in shape so they can have shape like (B, 512, 4, 4) before sending to the W* generator.
UPDATE thanks @flyingshan - i think it has to be 4x4 - π€· idk. you are going to get 16 pixels for that vs 1.
https://github.com/johndpope/MegaPortrait-hack/blob/main/model.py#L15 when my code runs through the warpgenerator - it upscales to end up at 64x64 so that is neat.
but then I can't add them together.
w_s2c = w_rt_s2c + w_em_s2c
w_em_s2c: torch.Size([1, 3, 16, 64, 64]) # 16 here for wem....π€· idk if it should be 64? w_rt_s2c: torch.Size([1, 3, 64, 64, 64])
this is output of my warpgen / warpfield code.
WarpField > zs sum.shape: torch.Size([1, 512, 4, 4])
conv1x1 > x.shape: torch.Size([1, 2048, 4, 4])
reshape_layer > x.shape: torch.Size([1, 512, 4, 4, 4])
π ResBlock3D x.shape: torch.Size([1, 512, 4, 4, 4])
conv1 > out.shape: torch.Size([1, 256, 4, 4, 4])
norm1 > out.shape: torch.Size([1, 256, 4, 4, 4])
F.relu(out) > out.shape: torch.Size([1, 256, 4, 4, 4])
conv2 > out.shape: torch.Size([1, 256, 4, 4, 4])
norm2 > out.shape: torch.Size([1, 256, 4, 4, 4])
residual > residual.shape: torch.Size([1, 256, 4, 4, 4])
upsample1 > x.shape: torch.Size([1, 256, 8, 8, 8])
π ResBlock3D x.shape: torch.Size([1, 256, 8, 8, 8])
conv1 > out.shape: torch.Size([1, 128, 8, 8, 8])
norm1 > out.shape: torch.Size([1, 128, 8, 8, 8])
F.relu(out) > out.shape: torch.Size([1, 128, 8, 8, 8])
conv2 > out.shape: torch.Size([1, 128, 8, 8, 8])
norm2 > out.shape: torch.Size([1, 128, 8, 8, 8])
residual > residual.shape: torch.Size([1, 128, 8, 8, 8])
upsample2 > x.shape: torch.Size([1, 128, 16, 16, 16])
π ResBlock3D x.shape: torch.Size([1, 128, 16, 16, 16])
conv1 > out.shape: torch.Size([1, 64, 16, 16, 16])
norm1 > out.shape: torch.Size([1, 64, 16, 16, 16])
F.relu(out) > out.shape: torch.Size([1, 64, 16, 16, 16])
conv2 > out.shape: torch.Size([1, 64, 16, 16, 16])
norm2 > out.shape: torch.Size([1, 64, 16, 16, 16])
residual > residual.shape: torch.Size([1, 64, 16, 16, 16])
upsample3 > x.shape: torch.Size([1, 64, 16, 32, 32])
π ResBlock3D x.shape: torch.Size([1, 64, 16, 32, 32])
conv1 > out.shape: torch.Size([1, 32, 16, 32, 32])
norm1 > out.shape: torch.Size([1, 32, 16, 32, 32])
F.relu(out) > out.shape: torch.Size([1, 32, 16, 32, 32])
conv2 > out.shape: torch.Size([1, 32, 16, 32, 32])
norm2 > out.shape: torch.Size([1, 32, 16, 32, 32])
residual > residual.shape: torch.Size([1, 32, 16, 32, 32])
upsample4 > x.shape: torch.Size([1, 32, 16, 64, 64])
conv3x3x3 > x.shape: torch.Size([1, 3, 16, 64, 64])
gn > x.shape: torch.Size([1, 3, 16, 64, 64])
F.relu > x.shape: torch.Size([1, 3, 16, 64, 64])
tanh > x.shape: torch.Size([1, 3, 16, 64, 64])
w_em_s2c: torch.Size([1, 3, 16, 64, 64])
w_rt_s2c: torch.Size([1, 3, 64, 64, 64])