devit icon indicating copy to clipboard operation
devit copied to clipboard

Few shot vs Open-vocabulary

Open theodu opened this issue 1 year ago • 1 comments

I don't manage to understand the difference between your few-shot and open-vocabulary models. Your approach is based on image-only models and Open-vocabulary approach relies on a text based embedding of the category name and the model is pre-trained with text-image pairs. So what is the difference between your open-vocabulary and few shot pipelines and trained models ?

I am wondering because the open-vocabulary/LVIS model (the one in the demo) gives me much better results than the few-shot one on the same test images with the same image context

Thanks for your work!

theodu avatar Oct 12 '23 12:10 theodu