CLIP-ODS
CLIP-ODS copied to clipboard
Is there any paper about how this work?
You can understand most of it by reading the source code.
Basically, the V0 uses a sliding window, choose the box with the highest score and performs postprocess. The V1 gets possible masks with OpenCV functions, gets bouding boxes from these masks and then uses CLIP to get predictions to feed to a postprocessing algorithm.