Current RAG limitations

Open galeone opened this issue 2 years ago • 0 comments

The current RAG implementation is limited and more than often provides wrong answers.

It works in this way:

Data Preparation

The structured data is converted into a report. The embedding of the report is computed using model gecko 003

Problems

The report contains only a small subset of the information available.
The report is only by day (daily report)
The report is populated by the LLM that sometimes inserts generated information instead of extracting (as requested) from the JSON data provided
The embedding is only one because the document is short and the model can convert it to a vector - is this good or should we use a window-based approach?

Data insertion

The report and the embeddings are stored inside the database.

Document Retrieval

The user asks a question. The question is embedded and a similarity search is done, the first 3 documents are retrieved and passed as Gemini as context.

Problems

How can we be sure that the documents are relevant? The similarity measure although provides a good numerical score, sometimes returns wrong documents (e.g. question asks for activities on March 8, and the model retrieves as the first document the report of March 18)
3 is totally arbitrary number. It's wrong on so many levels:
- When asking for a single day, we are giving (at least) 2 other days for no reason
- When asking for data that should span for multiple days, the context si limited to 3.

Potential solutions

These are not "solutions" just some tweaks/changes that potentially can lead the LLM to work on better data - at least.

[ ] Improve the reports: manually populate them instead of asking an LLM to do the magic. Write all the queries and improve the sections of the reports, extracting as much useful information as possible.
[ ] The reports should contain the questions that the user could ask
[ ] The reports should be aggregated and we should be able to generate weekly reports and monthly reports because the users are likely to ask for this type of information

Questions

How can we fetch all the relevant information and be sure to have fetched the correct data?
How can the LLM interpolate/aggregate data among different reports and extract correct values? It looks like LLMs are not good at doing calculations
How can we make the LLM look also at the data in the database - directly - if needed?

Apr 10 '24 16:04 galeone