rag-experiment-accelerator Prompts refactoring

Prompts refactoring

Open quovadim opened this issue 1 year ago • 1 comments

closes #505 Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt

Main new features are:

Added native and automatic support for JSON generation using openai format structure
Added prompt-level validation capabilities for generated answer
Added classes prompt with automatic validation of prompts and parameters
Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok
Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts
Moved prompts into separate txt files
Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated
Changed python version to python 3.11
Some prompts that were previously returning text now return JSONs
Some prompts that were previously returning JSONs now return text
Signature of generate_response now depends on prompt input parameters
Refactored all (except ragas) prompts, added examples.
some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code)

Metrics and comparison

Comparison on data generated using this branch

Metric	New Data, New Prompts	New Data, Old Prompts
fuzzy	75.88	77.05
bert_all_MiniLM_L6_v2	80.58	79.60
cosine	75.21	68.88
bert_distilbert_base_nli_stsb_mean_tokens	76.90	76.14
llm_answer_relevance	72.59	67.60
llm_context_precision	91.03	84.62

Comparison on data generated using development branch

Metric	Old Data, New Prompts	Old Data, Old Prompts
fuzzy	81.14	80.56
bert_all_MiniLM_L6_v2	69.71	66.58
cosine	66.78	57.51
bert_distilbert_base_nli_stsb_mean_tokens	70.49	67.78
llm_answer_relevance	62.18	57.63
llm_context_precision	84.62	78.85

Apr 30 '24 09:04 quovadim

You've removed quite a few try/except blocks. This might be fine, but it's not mentioned in the PR notes, and I wasn't sure if this change was desired/expected. Probably best to mention how the error handling has changed in the PR notes.

Apr 30 '24 15:04 martinpeck

rag-experiment-accelerator rag-experiment-accelerator copied to clipboard

Prompts refactoring

rag-experiment-accelerator
rag-experiment-accelerator copied to clipboard