LAVIS icon indicating copy to clipboard operation
LAVIS copied to clipboard

What are the differences between `base_coco` and `large_coco` model types for `blip_caption` ?

Open gschurck opened this issue 1 year ago • 1 comments

gschurck avatar Oct 16 '22 13:10 gschurck

Hi @gschurck, thanks for your interest.

base_coco is the BLIP_base finetuned on COCO; large_coco is the BLIP_large finetuned on COCO.

BLIP_base uses ViT_base, BLIP_large uses ViT_large.

Thanks.

dxli94 avatar Oct 17 '22 00:10 dxli94

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

gschurck avatar Oct 30 '22 13:10 gschurck

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

Hi @gschurck ,

No, base_coco and large_coco are related to model size. Both base_coco and large_coco support beam search and nucleus sampling.

Practically we found large_coco achieves better captioning metrics (higher quality captions)

Thanks.

dxli94 avatar Oct 31 '22 03:10 dxli94

Ok thanks.

gschurck avatar Oct 31 '22 06:10 gschurck