paper-qa
paper-qa copied to clipboard
Documentation for changing the answer language ?
Hello,
I use paperQA with "mixtral-8x7b-instruct-v0.1.Q4_K_M" served locally. This LLM understands French, German, Spanish, Italian, and English. I tested it via browser, it wors well in french.
When I use it with PaperQA, I ask my question in French but the summary is in English. It seems the model perfectly understand the query.
I guess it's a prompt preformatting question.
Is it a way to force PaperQA answer to French ?
Bonjour @Snikch63200,
Je suis là pour vous aider avec votre problème. Je suis un bot qui peut vous aider à résoudre des bugs, répondre à des questions et devenir un contributeur. N'hésitez pas à me poser des questions ou à signaler des problèmes.
Comment puis-je vous aider aujourd'hui ?
PaperQA does not have specific documentation or settings for changing the response language directly. However, you can customize the prompt to influence the language of the response. You can modify the prompt to instruct the model to respond in French. Here's an example of how you might adjust the prompt:
from paperqa import Docs, Settings
my_qa_prompt = (
"Répondez à la question '{question}' en français.\n"
"Utilisez le contexte ci-dessous si nécessaire. "
"Vous pouvez citer le contexte en utilisant la clé comme (Exemple2012). "
"Si le contexte est insuffisant, écrivez un poème "
"sur votre incapacité à répondre.\n\n"
"Contexte: {context}"
)
docs = Docs()
settings = Settings()
settings.prompts.qa = my_qa_prompt
docs.query("Votre question ici", settings=settings)
This example sets a custom prompt to instruct the model to answer in French. Adjust the prompt as needed to fit your specific requirements.
To continue talking to Dosu, mention @dosu.
It doesn't works and provides following answer :
Question: Quels sont les exemples de nécropoles mérovingiennes connues dans le Nord-Pas-de-Calais et quelles sont leurs caractéristiques?
Based on the provided context, there are no specific examples of Merovingian necropoles in the Nord-Pas-de-Calais region. However, I can provide some general information about Merovingian burial practices from the documents that may be helpful:
- Merovingian tombs often feature wooden chamber structures placed within the grave, constructed using planches assembled on encoches and plaqued against the fosse (trench) wall (Aigueperse (63) - Montussang, Site 1.pdf pages 86-87).
- At a Merovingian burial site in Aigueperse, an adult male was buried on his back with the head facing south and feet facing north, arms extended with the right arm in supination and the left arm's forearm resting in front of the same-side pelvis. The legs were straight (Aigueperse (63) - Montussang, Site 1.pdf pages 136-137).
I hope this information is helpful, even though it does not directly answer your question about known Merovingian necropoles in the Nord-Pas-de-Calais region.
References
- (Aigueperse (63) - Montussang, Site 1.pdf pages 86-87): File Aigueperse (63) - Montussang, Site 1.pdf
Any idea to fix this ? @dosu
Modify the system prompt to change languages.
settings.prompts.system
To ensure that PaperQA provides responses in French, you can try the following approach:
-
Modify the Prompt Template: Ensure that the prompt explicitly instructs the model to respond in French. You can adjust the
qaprompt template in theSettingsto include a clear instruction for the language. Here's an example:from paperqa import Docs, Settings my_qa_prompt = ( "Répondez à la question suivante en français : '{question}'.\n" "Utilisez le contexte ci-dessous si nécessaire. " "Citez le contexte en utilisant la clé comme (Exemple2012). " "Si le contexte est insuffisant, écrivez un poème " "sur votre incapacité à répondre.\n\n" "Contexte: {context}" ) docs = Docs() settings = Settings() settings.prompts.qa = my_qa_prompt docs.query("Votre question ici", settings=settings) -
Check Model Configuration: Ensure that the language model you are using is configured correctly to handle French. Since you mentioned that the model works well in French via a browser, verify that the same configuration is applied when using it with PaperQA.
-
Increase Verbosity for Debugging: If the issue persists, you can increase the logging verbosity to debug the prompt processing. This can help identify if the language instruction is being ignored or overridden. You can set the verbosity level in the
Settingsobject to get more detailed logs:settings.verbosity = 3 # Set to a higher level for detailed logs
By ensuring the prompt explicitly instructs the model to respond in French and verifying the model's configuration, you should be able to get responses in the desired language.
To continue talking to Dosu, mention @dosu.
I changed prompts as follows :
summary_prompt = (
" Résumez l’extrait ci-dessous pour vous aider à répondre à une question.\n\nExtrait de"
" {citation}\n\n----\n\n{text}\n\n----\n\nQuestion: {question}\n\nNe pas répondre"
" directement à la question, résumez plutôt afin de fournir des éléments aidant à répondre"
" à la question. Restez précis; rapporter des nombres, des équations ou des citations directes spécifiques"
" (marquées par des guillemets). Répondez « Non pertinent » si l'extrait n'est"
" pas pertinent. A la fin de la réponse, fournir un score entier entre 1 et 10 "
" sur une nouvelle ligne indiquant la pertinence par rapport à la question. Ne pas expliquer le score.\n\nRelevant"
" Information Summary ({summary_length}):"
)
summary_json_prompt = (
"Extrait de {citation}\n\n----\n\n{text}\n\n----\n\nQuestion: {question}\n\n"
)
qa_prompt = (
"Répondez à la question ci-dessous,en français, avec le contexte.\n\n"
"Contexte (with relevance scores):\n\n{context}\n\n----\n\n"
"Question: {question}\n\n"
"Ecrire une réponse basée sur le contexte. "
"Si le contexte ne fournit pas suffisamment d'informations répondre : "
'"Je ne peux pas répondre."'
"Pour chaque partie de votre réponse, indiquez par quelles sources elle est justifiée "
"via des citations clé à la fin des phrases, "
"comme {example_citation}. Ne citez que le contexte "
"ci-dessous et utilisez uniquement les clés valides. Ecrivez dans le style "
"d'un article Wikipedia, avec des phrases concises et des paragraphes cohérents. "
"Le contexte provient de diverses sources et n'est qu'un résumé, "
"il peut donc y avoir des inexactitudes ou des ambiguïtés. Si des guillemets sont présents et "
"pertinents, utilisez les dans la réponse. Cette réponse ira directement sur"
"Wikipedia, alors n'ajoutez aucune information superflue.\n\n"
"Réponse ({answer_length}):"
)
select_paper_prompt = (
"Sélectionnez des documents qui peuvent aider à répondre à la question ci-dessous. "
"Les articles et documents sont listés comme suit $KEY: $PAPER_INFO. "
"Retournez une liste de clés, séparées par des virgules. "
"Retournez 'None', si aucun document n'est utilisable. "
"Choisissez les documents pertinents, provenant de sources fiables, et récentes "
"(si la question requiert des informations récentes).\n\n"
"Question: {question}\n\n"
"Documents: {papers}\n\n"
"Clés séléctionées:"
)
citation_prompt = (
"Fournissez la citation du texte suivant au format MLA."
"Ne pas écrire de phrase d'introduction. "
"Si la date de reporting est consultée, l'année en cours est 2024\n\n"
"{text}\n\n"
"Citation:"
)
structured_citation_prompt = (
"Extrayez le titre, les auteurs et le doi au format JSON de cette citation MLA."
"Si un champ est introuvable, renvoyez-le comme null."
"Utilisez le titre, les auteurs et le doi comme clés, la valeur de l'auteur doit être une liste d'auteurs."
"{citation}\n\n"
"Citation JSON:"
)
default_system_prompt = (
"Répondez en Français avec un style direct et concis. "
"Votre public est expert, alors soyez très précis. "
"S'il existe des termes ou des acronymes ambigus, définissez-les d'abord."
)
# NOTE: we use double curly braces here so it's not considered an f-string template
summary_json_system_prompt = """\
Provide a summary of the relevant information that could help answer the question based on the excerpt. Respond with the following JSON format:
{{
"summary": "...",
"relevance_score": "..."
}}
where `summary` is relevant information from text - {summary_length} words and `relevance_score` is the relevance of `summary` to answer question (out of 10).
""" # noqa: E501
# Prompt templates for use with LitQA
QA_PROMPT_TEMPLATE = "Q: {question}\n\nOptions:\n{options}"
EVAL_PROMPT_TEMPLATE = (
"Extract the single letter answer from the following question and answer"
"\n\n{qa_prompt}"
"\n\n{qa_answer}"
"\n\nSingle Letter Answer:"
)
settings.prompts.qa = qa_prompt
settings.prompts.summary = summary_prompt
settings.prompts.select = select_paper_prompt
settings.prompts.system = default_system_prompt
settings.prompts.summary_json = summary_json_prompt
settings.parsing.citation_prompt = citation_prompt
settings.parsing.structured_citation_prompt = structured_citation_prompt
settings.answer.evidence_summary_length = "Environs 200 mots"
settings.answer.answer_length = "Environs 1000 mots, un peu plus si nécessaire"
Still answers in english... @dosu
To ensure that PaperQA responds in French, you should verify that the language settings are correctly applied throughout the system. Here are a few steps to troubleshoot and potentially resolve the issue:
-
Ensure Consistent Language Instructions: Double-check that all prompt templates explicitly instruct the model to respond in French. You've already set the
qa_promptanddefault_system_promptto specify French, which is good. -
Check for Overriding Settings: Make sure there are no other parts of the code or configuration that might override these language settings. This includes checking any default settings or environment variables that might affect the language output.
-
Increase Logging Verbosity: Use the logging features in PaperQA to trace how the prompts are being processed. By setting a higher verbosity level, you can get detailed logs that might reveal if the language instructions are being ignored or overridden. You can adjust the verbosity level in the
Settingsobject:settings.verbosity = 3 # Adjust to a higher level for detailed logs -
Verify Model Configuration: Ensure that the language model is correctly configured to handle French when integrated with PaperQA. Since it works in French via a browser, confirm that the same configuration is applied in your PaperQA setup.
-
Test with Minimal Setup: Try running a minimal setup with just the essential prompts to see if the issue persists. This can help isolate the problem and determine if it's related to specific prompt configurations.
By following these steps, you should be able to identify and resolve the issue with the language output.
To continue talking to Dosu, mention @dosu.
After a few empirical tests, it seems changing number of tokens from 512 to 1024 solves the problem.