open_clip icon indicating copy to clipboard operation
open_clip copied to clipboard

add generate w/ beam search

Open Soonhwan-Kwon opened this issue 1 year ago • 10 comments

Proposal to add beam search for coca.

  • [x] beam search w/o past key values
  • [ ] beam search w/ past key values

Soonhwan-Kwon avatar Dec 24 '22 02:12 Soonhwan-Kwon

panda

test result '<start_of_text>giant panda , chengdu , china <end_of_text>'

Soonhwan-Kwon avatar Dec 24 '22 02:12 Soonhwan-Kwon

Nice! Can you share a few more examples results?

rom1504 avatar Dec 24 '22 04:12 rom1504

1 2 3 4

Soonhwan-Kwon avatar Dec 24 '22 05:12 Soonhwan-Kwon

some are good but some are bad, and it needs to be fine-tuned with COCO dataset as the CoCa paper for better result. and I'm evaluating the scores on COCO dataset now.

Soonhwan-Kwon avatar Dec 24 '22 05:12 Soonhwan-Kwon

It is much slower implementation because it is w/o past_key_values but I expect it to be much more faster w/ past_key_values. I wanted to move on step by step, because wrong implementation can degrade the overall generation performance.

Soonhwan-Kwon avatar Dec 24 '22 05:12 Soonhwan-Kwon

some are good but some are bad, and it needs to be fine-tuned with COCO dataset as the CoCa paper for better result. and I'm evaluating the scores on COCO dataset now.

I think finetuning on ms coco, wizviz and localized narratives Would be a good idea - Maybe we should filter these data sets with clip H sim, Some of the texts are not sooo good :D

christophschuhmann avatar Dec 24 '22 14:12 christophschuhmann

It would be really cool if you could make finetuning called for plugging image embeddings into the coca text decoder and train only the decoder :)

christophschuhmann avatar Dec 24 '22 15:12 christophschuhmann

@Soonhwan-Kwon i am working on adding coco as a dataset so we can make evaluation automatically, should make the PR on a couple days unless you are doing it already

gpucce avatar Dec 24 '22 15:12 gpucce

It would be really cool if you could make finetuning called for plugging image embeddings into the coca text decoder and train only the decoder :)

It sounds very interesting! I'll definitely make a PR for it.

Soonhwan-Kwon avatar Dec 24 '22 15:12 Soonhwan-Kwon

@Soonhwan-Kwon i am working on adding coco as a dataset so we can make evaluation automatically, should make the PR on a couple days unless you are doing it already

Sure, I'm not coding any fine-tuning on coco right now, feel free to make PR.

Soonhwan-Kwon avatar Dec 24 '22 15:12 Soonhwan-Kwon

fix image_latents and image_tokens are calculated once instead of every seq_len. generate time reduced, 2.5967->1.8194, generate_beamsearch reduced(5 images) 53.1091->30.3831

Soonhwan-Kwon avatar Dec 25 '22 02:12 Soonhwan-Kwon

hi @Soonhwan-Kwon can you rebase on coca head and fix tests ?

rom1504 avatar Jan 06 '23 00:01 rom1504