BLIP icon indicating copy to clipboard operation
BLIP copied to clipboard

Weird caption for a picture of flower

Open phelogges opened this issue 2 years ago • 5 comments

I got a weird caption on a picture of flower and don't know why:( Hope for some advice

Model: model_base_capfilt_large.pth sha256: 8f5187458d4d47bb87876faf3038d5947eff17475edf52cf47b62e84da0b235f

some core codes:

device = torch.device("cpu")

image_size = 224
image_path = "xxx" # say we read image by path
model = blip_decoder("checkpoints/model_base_capfilt_large.pth", image_size=image_size, vit="base")
model.eval()
model = model.to(device)

raw_image = Image.open(image_path).convert("RGB")
    transform = transforms.Compose([
        transforms.Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),
        transforms.ToTensor(),
        transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
    ])
image = transform(raw_image).unsqueeze(0).to(device)

with torch.no_grad():
    # beam search
    t0 = time.time()
    caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0]
    cost = time.time() - t0
print(caption)

Output: dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai dai

Here's the input image

image

phelogges avatar Sep 22 '22 02:09 phelogges

Update Same code and model, test other two pictures of flower, similar to picture above

Pic1: a bunch of dai dai dai dai dai dai dai dai dai dai dai dai dai 3

Pic2: a bunch of yellow flowers 2

According to pic1 caption, model just thinks dai is a kind of flower, and these flowers also called daisy:)

phelogges avatar Sep 22 '22 02:09 phelogges

Thanks for posting this interesting behavior from the model, this is new to me :)

LiJunnan1992 avatar Sep 23 '22 01:09 LiJunnan1992

So any advice for improvment? may be params in caption = model.generate(image, sample=False, num_beams=3, max_length=20, min_length=5)[0] will help

phelogges avatar Sep 24 '22 04:09 phelogges

you may want to try the image captioning model finetuned on COCO

LiJunnan1992 avatar Oct 03 '22 01:10 LiJunnan1992

nucleus sampling also doesnt do this behaviour. The way I see it is that the beam search tries to fill the min length but gets stuck on the same thing when the picture is simple and there is not much else to say.

saffie91 avatar Dec 15 '22 15:12 saffie91