CMG
CMG copied to clipboard
The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)
您好,我在阅读您的代码时似乎发现了一个问题, 它在 `code/src/model/main_model_2.py` 中的第790行: ``` python for i in unactivated_indices: self.embedding[i] = activated_quantized[random.randint(0,len(activated_indices)-1)] + torch.Tensor(256).uniform_(-1/1024, -1/1024).cuda() ``` 我认为这里应该是 (-1/1024, 1/1024) 而不是(-1/1024, -1/1024) 同样的问题还出现在977行和1152行 希望这对您的工作有所帮助 :D
您好,我尝试follow您的工作,并迁移到其它领域,但是在训练过程中主要遇到了如下几个问题: 1. lld_loss不收敛,导致互信息上界估计不准确,影响训练过程 2. 使用mi_loss之后,模型参数中出现nan 3. mi_loss随着训练过程越来越大 我尝试了调整mi_net的层数和学习率等方法,但是问题依然存在。 想请教您模型训练中的更多细节: 1. 您的模型在训练过程中,lld_loss是否是逐渐收敛的,还是稳定在一个范围? 2. 在mi_loss的反向传播中,mi_net的参数是否更新? 3. mi_loss的训练过程大概如何,是否收敛?
If I'd like to use CMG on my own dataset (for video and audio), how should I prepare the data? I've got video-audio pairs, whether should I extract their features?...
在pretrain.py文件的第599行里与model/CPC.py里的forward函数中40行的传参和98行返回值是不对应的。
https://github.com/haihuangcode/CMG/blob/2cbdad8f68d6000657ddf45ace97c855c022334d/code/src/model/main_model_2.py#L507C1-L515C60 Hi sir! Thanks for your great work! I have some questions I would like to ask you. I don't know if it's right to understand it this way: self.audio_semantic_decoder...
如果audio_feat,video_feat,text_feat的特征序列长度都不同,AVT_VQVAE_Encoder中的self.Cross_quantizer = Cross_VQEmbeddingEMA_AVT(n_embeddings, self.hidden_dim)传播会出错。 v_ph = torch.reshape(v_ph, ((B, T, M))) # [BxT, M] -> [B, T, M] RuntimeError: shape '[16, 99, 400]' is invalid for input of size 236800 Cross_VQEmbeddingEMA_AVT部分怎么修改代码,我想直接用audio_feat,video_feat,text_feat通过AVT_VQVAE_Encoder获取量化后语义对齐的特征表示audio_vq,video_vq,text_vq,进行下游任务。
在main_model_2.py的Cross_VQEmbeddingEMA中,self.embedding更新了三次【self.embedding = self.ema_weight / self.ema_count.unsqueeze(-1)】,但只有最后一次赋值起作用?
Hi, I've been trying to set up the project using the provided requirements.txt file, but I'm encountering multiple dependency conflicts during the installation process. Specifically, issues with package versions that...