lgs00
lgs00
Thanks for your great work. In line 633 of gaussian_diffusion.py, terms["nll"] is calculated but not used. Whther it is a mistake, or whether it doesn't work. ` terms["nll"] = self._token_discrete_loss(model_out_x_start,...
Hello, thank you for your contribution. I meet a question on _**line** 66 of the **file** models/spi_llava.py_, `image_forward_outs = vision_tower(images,output_hidden_states=True) ` What is the structure of this vision_tower?
Use the model provided, but in the case of 1024 resolution, generate pure black video, 512 resolution works well, what is the problem. https://github.com/user-attachments/assets/11f70a00-f7b7-4f44-b845-176e5435d8c4
Thanks for sharing the code. When I changed the resolution of the training video to [832,608], `h = torch.cat([h, hs.pop()], dim=1) ` There is an error: RuntimeError: Sizes of tensors...
Thank you for your code. How do you get the agnostic-mask, I think it is an important factor for the result.
when I use the pretrained model ip_adapter/ip-adapter-plus_sdxl_vit-h.bin. Which model should we use?
Why is agnostic_mask different in the dresscode dataset? We need to enter keypoints and label_maps