fitsleepinsights
fitsleepinsights copied to clipboard
Current RAG limitations
The current RAG implementation is limited and more than often provides wrong answers.
It works in this way:
Data Preparation
The structured data is converted into a report. The embedding of the report is computed using model gecko 003
Problems
- The report contains only a small subset of the information available.
- The report is only by day (daily report)
- The report is populated by the LLM that sometimes inserts generated information instead of extracting (as requested) from the JSON data provided
- The embedding is only one because the document is short and the model can convert it to a vector - is this good or should we use a window-based approach?
Data insertion
The report and the embeddings are stored inside the database.
Document Retrieval
The user asks a question. The question is embedded and a similarity search is done, the first 3 documents are retrieved and passed as Gemini as context.
Problems
- How can we be sure that the documents are relevant? The similarity measure although provides a good numerical score, sometimes returns wrong documents (e.g. question asks for activities on March 8, and the model retrieves as the first document the report of March 18)
- 3 is totally arbitrary number. It's wrong on so many levels:
- When asking for a single day, we are giving (at least) 2 other days for no reason
- When asking for data that should span for multiple days, the context si limited to 3.
Potential solutions
These are not "solutions" just some tweaks/changes that potentially can lead the LLM to work on better data - at least.
- [ ] Improve the reports: manually populate them instead of asking an LLM to do the magic. Write all the queries and improve the sections of the reports, extracting as much useful information as possible.
- [ ] The reports should contain the questions that the user could ask
- [ ] The reports should be aggregated and we should be able to generate weekly reports and monthly reports because the users are likely to ask for this type of information
Questions
- How can we fetch all the relevant information and be sure to have fetched the correct data?
- How can the LLM interpolate/aggregate data among different reports and extract correct values? It looks like LLMs are not good at doing calculations
- How can we make the LLM look also at the data in the database - directly - if needed?