embedbase
embedbase copied to clipboard
[Core/Hosted]: search feedback
Feature request
ChatGPT like user feedback to open opportunities to improve user experience
Motivation
- interpretability over user happiness as statistics, visualisation in the dashboard
- eventual fine-tuning or reranking
Your contribution
I'm imagining a middleware that save search queries and a new end point that accept feedback on search result with the ID of the result
from typing import Callable
import uuid
from fastapi import Request
from fastapi.responses import JSONResponse
from embedbase.database.base import VectorDatabase
from embedbase.embedding.base import Embedder
async def save_search(
request: Request, call_next: Callable, db: VectorDatabase, embedder: Embedder
):
"""
Upon search request, save the request to a database.
"""
# todo overlap with add on "search" dataset
if request.method != "POST" or "/v1/search" not in request.url.path:
return await call_next(request)
request_body = await request.json()
new_id = str(uuid.uuid4())
request_body["id"] = new_id
response = await db.save("search", request_body)
return await call_next(request)
(almost) pseudo code for feedback endpoint:
app = (
get_app()
.use_embedder(...)
.use_db(...)
.run()
)
# An endpoint that let you rate search results
@app.post("/feedback")
async def human_feedback(req, cb, db, embedder):
# here would save to a table feedback
# the request body looks like "searchid: vrevrwrew, feedback: 0 or 1"
db.save("feedback", req.body)
return 200
Suggest Two milestones in evaluations on search (I don't fully know the entire architecture, so I leave coarse pseudo codes. )
-
Direct Thumbs up/down -> Only developers (or who built the dataset) are willing to assess this query result. -> Who built the service (i.e. a developer at hexafarms or a student who fed their study materials) -> High-quality feedback but the limited amount of feedback (max. ~100 feedback)
-
Indirect evaluation of search return So, we can understand how users think good or bad. (in hexafarms' case: farmers) [logic I] if similar query asked, give penalty. [because it means the return value was not good enough.]
if similarity_score(embeding_vector(previous_query), embeding_vector(current_query)) > 0.5 and abs(previous_query.time - current_query.time) < 60*5:
dataset(current_query).score *= 0.9 # penalty ratio
[logic II] if queries are asked many times, but not in shot-term.
def __call__(dataset):
dataset.score *= 1.001 # adding score