rag-experiment-accelerator
rag-experiment-accelerator copied to clipboard
Prompts refactoring
closes #505 Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt
Main new features are:
- Added native and automatic support for JSON generation using openai format structure
- Added prompt-level validation capabilities for generated answer
- Added classes prompt with automatic validation of prompts and parameters
- Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok
- Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts
- Moved prompts into separate txt files
- Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated
- Changed python version to python 3.11
- Some prompts that were previously returning text now return JSONs
- Some prompts that were previously returning JSONs now return text
- Signature of generate_response now depends on prompt input parameters
- Refactored all (except ragas) prompts, added examples.
- some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code)
Metrics and comparison
Comparison on data generated using this branch
| Metric | New Data, New Prompts | New Data, Old Prompts |
|---|---|---|
| fuzzy | 75.88 | 77.05 |
| bert_all_MiniLM_L6_v2 | 80.58 | 79.60 |
| cosine | 75.21 | 68.88 |
| bert_distilbert_base_nli_stsb_mean_tokens | 76.90 | 76.14 |
| llm_answer_relevance | 72.59 | 67.60 |
| llm_context_precision | 91.03 | 84.62 |
Comparison on data generated using development branch
| Metric | Old Data, New Prompts | Old Data, Old Prompts |
|---|---|---|
| fuzzy | 81.14 | 80.56 |
| bert_all_MiniLM_L6_v2 | 69.71 | 66.58 |
| cosine | 66.78 | 57.51 |
| bert_distilbert_base_nli_stsb_mean_tokens | 70.49 | 67.78 |
| llm_answer_relevance | 62.18 | 57.63 |
| llm_context_precision | 84.62 | 78.85 |
You've removed quite a few try/except blocks. This might be fine, but it's not mentioned in the PR notes, and I wasn't sure if this change was desired/expected. Probably best to mention how the error handling has changed in the PR notes.