CLIPPyX icon indicating copy to clipboard operation
CLIPPyX copied to clipboard

Apply threshold & change number of retrieved images

Open 0ssamaak0 opened this issue 1 year ago • 2 comments

currently, any search query shows by default the top 5 matches regardless of the similarity score.

Implement a thresholding mechanism to filter out similarity scores below a certain value, ensuring that only relevant results are displayed, Noting that each case (and each model) might have different optimal threshold we need t explore them

# server.py
def search_clip_text(text, image_collection):
...
    # change this 5 to another number (maybe add it in `config.yaml`)
    results = image_collection.query(text_embedding, n_results=5)
    # apply threshold (differes for each task & each model)

def search_clip_image(image_path, image_collection, get_self=False):
#same
def search_embed_text(text, text_collection):
#same 

0ssamaak0 avatar Jun 03 '24 12:06 0ssamaak0

maybe use a strategy similar to top_p or min_p where the number of results depends on the similarity score of the most similar result, here's a quick explanation: https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/ great work btw!

MahmoudAshraf97 avatar Jun 03 '24 14:06 MahmoudAshraf97

Very nice idea! I haven't thought about applying LLMs sampling methods. I will check this.

Thank you 😁😁

0ssamaak0 avatar Jun 03 '24 15:06 0ssamaak0