LAVIS What are the differences between `base_coco` and `large_coco` model types for `blip

What are the differences between `base_coco` and `large_coco` model types for `blip_caption` ?

Open gschurck opened this issue 2 years ago • 1 comments

Oct 16 '22 13:10 gschurck

Hi @gschurck, thanks for your interest.

base_coco is the BLIP_base finetuned on COCO; large_coco is the BLIP_large finetuned on COCO.

BLIP_base uses ViT_base, BLIP_large uses ViT_large.

Thanks.

Oct 17 '22 00:10 dxli94

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

Oct 30 '22 13:10 gschurck

Okay, are they directly related to Beam search or Nucleus Sampling algorithms ?

Hi @gschurck ,

No, base_coco and large_coco are related to model size. Both base_coco and large_coco support beam search and nucleus sampling.

Practically we found large_coco achieves better captioning metrics (higher quality captions)

Thanks.

Oct 31 '22 03:10 dxli94

Ok thanks.

Oct 31 '22 06:10 gschurck