fromage icon indicating copy to clipboard operation
fromage copied to clipboard

retrieval only mode

Open oferidan1 opened this issue 9 months ago • 1 comments

Hi, Thanks for sharing your great paper and code! I am wondering about a use case on retrieval only mode (without dialogue or question ansewring). is training the "Image-captioning" model benefits retrieval model? for example, when using images as context - if so, why is it better than the visual embedding of the retrieval model for the context images? also, as part of the retrieval model, you have the cross entrophy loss vs the input caption. Is this loss benefitial for retrieval only mode? Thanks, Ofer

oferidan1 avatar May 05 '24 05:05 oferidan1

They're mostly independent, you can refer to Table 3 in the appendix of the paper for an ablation. We find that the captioning loss doesn't really affect retrieval performance that much.

kohjingyu avatar May 06 '24 20:05 kohjingyu