EXIT comments

Results 60 comments of


                                            EXIT

About temporal editing evaluation

Hi Syed, 1. **Evaluation (Table 5):** we used same text prompt for evaluation since there is ground truth to evaluate the different text prompts. 2. **Generation (Fig. 7):** we generate...

about the mean and std of part

Hi, You can ignore that part. I was just experimenting with a separated codebook (upper and lower), but I ended up not using it in the paper. Everything under [is_upperlower](https://github.com/exitudio/BAMM/blob/e6910b3c3d38b2e1ee131f96a74c7714a8419219/models/vq/model.py#L38)...

about the mean and std of part

> > Hi, You can ignore that part. I was just experimenting with a separated codebook (upper and lower), but I ended up not using it in the paper. Everything...

Request for Sharing Transformer's Logs (Accuracy and Loss)

Hi, You can find the training logs for the 1st, 2nd stages, and evaluation in the output folder if you have already downloaded them from [2.3. Pre-trained models]. Alternatively, you...

How to train other datasets?

Hi, We don't support other dataset. But you can modify the code here [1](https://github.com/exitudio/MMM/blob/main/dataset/dataset_VQ.py#L17) [2](https://github.com/exitudio/MMM/blob/main/train_vq.py#L48) [3](https://github.com/exitudio/MMM/blob/main/train_t2m_trans.py#L56C69-L56C78).

Can you share the training log of t2m_trans

Hi, Here is the log before the code cleanup. I am re-training to verify again. ``` 2023-10-12 10:11:15,494 INFO { "batch_size": 512, "block_size": 51, "clip_dim": 512, "code_dim": 32, "dataname": "t2m",...

Can you share the training log of t2m_trans

1. No. It's save from the last epoch. 2. If you intend to train a Transformer, only load the pretrained vqvae.pth model. Perhaps it has been overtrained.

Can you share the training log of t2m_trans

Can you try to train longer? It seems like you're using a batch size of 128 with 30,000 epochs. I use batch size of 512 with 75,000 epochs. I also...

reset codebook

Hi, We reset codebook in quantization process [here](https://github.com/exitudio/MMM/blob/main/models/quantize_cnn.py#L69-L71).

About the word_emb for cross attention

Hi, we use [MASK] tokens for generation by iterative decoding and [PAD] tokens to fill up the shorter length samples. [PAD] tokens in CLIP model can be in similar manner....