open_clip
open_clip copied to clipboard
add generate w/ beam search
Proposal to add beam search for coca.
- [x] beam search w/o past key values
- [ ] beam search w/ past key values
test result '<start_of_text>giant panda , chengdu , china <end_of_text>'
Nice! Can you share a few more examples results?
some are good but some are bad, and it needs to be fine-tuned with COCO dataset as the CoCa paper for better result. and I'm evaluating the scores on COCO dataset now.
It is much slower implementation because it is w/o past_key_values but I expect it to be much more faster w/ past_key_values. I wanted to move on step by step, because wrong implementation can degrade the overall generation performance.
some are good but some are bad, and it needs to be fine-tuned with COCO dataset as the CoCa paper for better result. and I'm evaluating the scores on COCO dataset now.
I think finetuning on ms coco, wizviz and localized narratives Would be a good idea - Maybe we should filter these data sets with clip H sim, Some of the texts are not sooo good :D
It would be really cool if you could make finetuning called for plugging image embeddings into the coca text decoder and train only the decoder :)
@Soonhwan-Kwon i am working on adding coco as a dataset so we can make evaluation automatically, should make the PR on a couple days unless you are doing it already
It would be really cool if you could make finetuning called for plugging image embeddings into the coca text decoder and train only the decoder :)
It sounds very interesting! I'll definitely make a PR for it.
@Soonhwan-Kwon i am working on adding coco as a dataset so we can make evaluation automatically, should make the PR on a couple days unless you are doing it already
Sure, I'm not coding any fine-tuning on coco right now, feel free to make PR.
fix image_latents and image_tokens are calculated once instead of every seq_len. generate time reduced, 2.5967->1.8194, generate_beamsearch reduced(5 images) 53.1091->30.3831
hi @Soonhwan-Kwon can you rebase on coca head and fix tests ?