BLIP Weird caption for a picture of flower

I got a weird caption on a picture of flower and don't know why:( Hope for some advice

Model: model_base_capfilt_large.pth sha256: 8f5187458d4d47bb87876faf3038d5947eff17475edf52cf47b62e84da0b235f

some core codes:

device = torch.device("cpu")

image_size = 224
image_path = "xxx" # say we read image by path
model = blip_decoder("checkpoints/model_base_capfilt_large.pth", image_size=image_size, vit="base")
model.eval()
model = model.to(device)

raw_image = Image.open(image_path).convert("RGB")
    transform = transforms.Compose([
        transforms.Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
    ])
image = transform(raw_image).unsqueeze(0).to(device)

with torch.no_grad():
    # beam search
    t0 = time.time()
    caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0]
    cost = time.time() - t0
print(caption)

Output: dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai

Here's the input image

Sep 22 '22 02:09 phelogges

Update Same code and model, test other two pictures of flower, similar to picture above

Pic1: a bunch of dai dai dai dai dai dai dai dai dai dai dai dai dai

Pic2: a bunch of yellow flowers

According to pic1 caption, model just thinks dai is a kind of flower, and these flowers also called daisy：）

Sep 22 '22 02:09 phelogges

Thanks for posting this interesting behavior from the model, this is new to me :)

Sep 23 '22 01:09 LiJunnan1992

So any advice for improvment? may be params in caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0] will help

Sep 24 '22 04:09 phelogges

you may want to try the image captioning model finetuned on COCO

Oct 03 '22 01:10 LiJunnan1992

nucleus sampling also doesnt do this behaviour. The way I see it is that the beam search tries to fill the min length but gets stuck on the same thing when the picture is simple and there is not much else to say.

Dec 15 '22 15:12 saffie91

BLIP BLIP copied to clipboard

Weird caption for a picture of flower

BLIP
BLIP copied to clipboard