Apply threshold & change number of retrieved images
currently, any search query shows by default the top 5 matches regardless of the similarity score.
Implement a thresholding mechanism to filter out similarity scores below a certain value, ensuring that only relevant results are displayed, Noting that each case (and each model) might have different optimal threshold we need t explore them
# server.py
def search_clip_text(text, image_collection):
...
# change this 5 to another number (maybe add it in `config.yaml`)
results = image_collection.query(text_embedding, n_results=5)
# apply threshold (differes for each task & each model)
def search_clip_image(image_path, image_collection, get_self=False):
#same
def search_embed_text(text, text_collection):
#same
maybe use a strategy similar to top_p or min_p where the number of results depends on the similarity score of the most similar result, here's a quick explanation:
https://www.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/
great work btw!
Very nice idea! I haven't thought about applying LLMs sampling methods. I will check this.
Thank you 😁😁