LAVIS
LAVIS copied to clipboard
What are the differences between `base_coco` and `large_coco` model types for `blip_caption` ?
Hi @gschurck, thanks for your interest.
base_coco is the BLIP_base finetuned on COCO; large_coco is the BLIP_large finetuned on COCO.
BLIP_base uses ViT_base, BLIP_large uses ViT_large.
Thanks.
Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?
Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?
Hi @gschurck ,
No, base_coco and large_coco are related to model size. Both base_coco and large_coco support beam search and nucleus sampling.
Practically we found large_coco achieves better captioning metrics (higher quality captions)
Thanks.
Ok thanks.