GoogleConceptualCaptioning
GoogleConceptualCaptioning copied to clipboard
beam_size == 1 for self-critical decoding?
The paper 'Self-critical Sequence Training for Image Captioning' mentions that our baseline is greedy argmax decoding, which is the same as the inference time technique used.
If that's the case, shouldn't the beam size for inference be always 1? If we just choose argmax at each step, there's just one possible way of forming a sentence right?
There is no specific connection between baseline and inference method. They can be different.
SCST uses greedy decoding during inference time because beam search doesn't boost the performance much.