scenic
scenic copied to clipboard
Results do not match between demo and pretrained models
Hi,
I was trying to reproduce results I got from the demo (https://owlvit-tgc6as3cga-ew.a.run.app/) by loading the pretrained models: owlvit-base-patch32
, owlvit-base-patch16
and owlvit-large-patch14
and following the usage guide on HuggingFace (https://huggingface.co/google/owlvit-base-patch32) for text-conditioned detection, but I could not get the same results. Do you have any suggestions for this discrepancy?
Best, Thang
I noticed the same issue - especially in reference image mode. Huggingface integration is broken. Use scripts from original repo instead