Curya
Curya
> Thanks for your reply. Would you please provide the `wandb output.log` file of your training process?
Sorry for another question, the training settings reported in your paper: > We train our model with MLE objective for 15 epochs and further train with different rewards for 25...
> I just remember that I actually ran the original CLIP-ViL training script to run the MLE model. Could you please run with the same batch size=10 for 25 epochs...
> Yes OK, I will try soon. Thank you again.
> For multi-gpus, I guess you could get the similar performance with fewer warmup steps, such as 1000 steps. Yes, I have tried with warmup steps of 1250, and the...
Sorry to confuse you, but actually we have not used scheduled sampling. Our code is based on the repo of JDAI-CV/image-captioning. You can find more detail about how ss work...
I am sorry that we have not organized our visualization code. But I can provide you with a core demo code: ```python import cv2 import matplotlib.pyplot as plt import skimage.transform...
Answer to 1 and 2: Yes, but not perfect. `--load_epoch` and `--resume` are all used to re-load weights. Specifically: + `--resume` is the epoch number of trained model that you...
> Dear author, I have a question. > > What is the difference between "-1" in "TARGET_SENT" and "0" in "INPUT_SENT"? Just like the first image and second image? "-1"...
Sorry, I am not sure why. Loading pre-trained backbone aims to extract high-quality visual features, if you want to fine-tune the backbone weights, maybe setting a smaller lr for backbone...