kotaemon icon indicating copy to clipboard operation
kotaemon copied to clipboard

[BUG] - <AIMessage Content Missing Error and Citation Pipeline Validation Error While Using Local LLM (ollama)>

Open Lee-Ju-Yeong opened this issue 1 year ago • 0 comments

Description

Description: I am encountering two main issues while using the RAG (Retrieval-Augmented Generation) system with a local LLM (ollama) for document retrieval and question-answering. The system is set up to retrieve documents from a vector database using RAG, and I am experiencing the following errors:

TypeError: AIMessage.init() missing 1 required positional argument: 'content'

The error occurs during the process of generating and appending the AIMessage response. It seems like the content is not being passed correctly when constructing the AIMessage object. Example log:

Traceback (most recent call last): ... messages.append(AIMessage(content=ai)) File "/Users/zooyong/Documents/Kotaemon/libs/kotaemon/kotaemon/base/schema.py", line 63, in init super().init(*args, **kwargs) TypeError: AIMessage.init() missing 1 required positional argument: 'content' ValidationError in CitationPipeline:

The second issue is related to the citation generation. The system is expected to handle citation evidence, but it fails to parse the evidence list properly. Example log:

CitationPipeline: {"evidences":"["Greenville Park is a 1 acre park", "The park is located in South Carolina"]"} 1 validation error for CiteEvidence evidences Input should be a valid list [type=list_type, input_value='["Greenville Park is a 1...ted in South Carolina"]', input_type=str]

small-town-html.html.zip

Reproduction steps

1.Set up a RAG system with a local LLM (ollama).
2.Run a document retrieval query.
3.Observe the error related to missing content in AIMessage.
4.Observe the validation error during the citation process.

Screenshots

No response

Logs

User-id: 1, can see public conversations: True
세션의 추론 유형  simple
세션의 LLM  ollama
추론 클래스  <class 'ktem.reasoning.simple.FullQAPipeline'>
추론 상태 {'app': {'regen': False}, 'pipeline': {}}
추론중 ...
Retrievers [DocumentRetrievalPipeline(DS=<kotaemon.storages.docstores.lancedb.LanceDBDocumentStore object at 0x178bb82b0>, FSPath=PosixPath('/Users/zooyong/Documents/Kotaemon/ktem_app_data/user_data/files/index_1'), Index=<class 'ktem.index.file.index.IndexTable'>, Source=<class 'ktem.index.file.index.Source'>, VS=<kotaemon.storages.vectorstores.chroma.ChromaVectorStore object at 0x178bb8100>, get_extra_table=False, llm_scorer=LLMTrulensScoring(concurrent=True, normalize=10, prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x17fd41ae0>, system_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x17fd41840>, top_k=3, user_prompt_template=<kotaemon.llms.prompts.template.PromptTemplate object at 0x17fd428c0>), mmr=False, rerankers=[CohereReranking(cohere_api_key='', model_name='rerank-multilingual-v2.0', use_key_from_ktem=True)], retrieval_mode='vector', top_k=10, user_id=1), GraphRAGRetrieverPipeline(DS=<theflow.base.unset_ object at 0x1016aa0e0>, FSPath=<theflow.base.unset_ object at 0x1016aa0e0>, Index=<class 'ktem.index.file.index.IndexTable'>, Source=<theflow.base.unset_ object at 0x1016aa0e0>, VS=<theflow.base.unset_ object at 0x1016aa0e0>, file_ids=[], user_id=<theflow.base.unset_ object at 0x1016aa0e0>)]
searching in doc_ids ['019bb0c8-5700-44c8-841a-eaedf152c3fa', '0e21b98c-2e34-4846-9b46-bbcd83c8fdec', '11459e65-109a-4700-bd2e-dbb554c2a3d3', '17ae6d7f-e96a-4960-ad55-5d463fc3be0f', '1cd21c75-ea42-4502-a4f9-4700ce84859c', '4ee8b0b2-35d7-4f76-a593-3741a22fb4ab', '65be4a78-49be-4dda-a024-a54bbf1a6cc6', '77c5d4f3-9774-47c0-80e6-0d26e563bbef', '7a2c1a1c-d25a-4cce-98cd-a71a23cab5c0', '7b3f2537-0e6e-4635-a63d-d5272a154572', '7eba5fbf-d7b1-4f1b-a0df-10d5bb3a56dc', '84e8bbb0-5b54-4160-a1fa-cefe1c076a9d', '893987e5-422e-45b9-9384-3343c21ba872', '99cb8d4f-e0e8-4c62-8d0b-50c5c29ad2c6', 'a22b747d-a076-4afe-8150-6f87b7cd64f1', 'a505646e-f0a7-44ef-acd0-d4a2f942aa1c', 'a6b06055-f3d5-4bfc-893a-9f2c8062470d', 'b250bc33-c03b-4cc6-8d08-2adba65c7a20', 'b50b5ace-140d-4213-b9ab-b4e177a3f46c', 'b77704b5-02a7-48e7-8d74-86fe335c3b97', 'c0c4a52a-b893-4a7c-9c12-067395036d3c', 'c5e6ea07-aaea-4043-ae81-e836e50f1500', 'd2544aec-f8c7-4e37-beea-194f124e329b', 'e2f0f5ac-209b-49fd-ac2b-11ded87f3ca2', 'e9ddaab0-ec0a-48d3-ad81-1c2afe829211', 'ee71f36f-f051-4d50-b302-84f3e89cab0d']
retrieval_kwargs: dict_keys(['do_extend', 'scope', 'filters'])
Cannot get Cohere API key from `ktem` 'NoneType' object has no attribute '_kwargs'
Cohere API key not found. Skipping reranking.
Got raw 10 retrieved documents
thumbnail docs 0 non-thumbnail docs 10 raw-thumbnail docs 0
retrieval step took 0.3695368766784668
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Document is not pdf
Got 10 retrieved documents
len (original) 32926
len (trimmed) 32926
Got 0 images
CitationPipeline: invoking LLM
Traceback (most recent call last):
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/queueing.py", line 575, in process_events
    response = await route_utils.call_process_api(
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
    return await future
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 914, in run
    result = context.run(func, *args)
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/opt/miniconda3/envs/kotaemon/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/Users/zooyong/Documents/Kotaemon/libs/ktem/ktem/pages/chat/__init__.py", line 871, in chat_fn
    for response in pipeline.stream(chat_input, conversation_id, chat_history):
  File "/Users/zooyong/Documents/Kotaemon/libs/ktem/ktem/reasoning/simple.py", line 673, in stream
    answer = yield from self.answering_pipeline.stream(
  File "/Users/zooyong/Documents/Kotaemon/libs/ktem/ktem/reasoning/simple.py", line 349, in stream
    messages.append(AIMessage(content=ai))
  File "/Users/zooyong/Documents/Kotaemon/libs/kotaemon/kotaemon/base/schema.py", line 63, in __init__
    super().__init__(*args, **kwargs)
TypeError: AIMessage.__init__() missing 1 required positional argument: 'content'
CitationPipeline: finish invoking LLM
CitationPipeline: {"evidences":"[\"Greenville Park is a 1 acre park\", \"The park is located in South Carolina\"]"}
1 validation error for CiteEvidence
evidences
  Input should be a valid list [type=list_type, input_value='["Greenville Park is a 1...ted in South Carolina"]', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/list_type
LLM rerank scores [0.7, 0.3, 0.2, 0.2, 0.2, 0.2, 0.1, 0.1, 0.0, 0.0]

Browsers

Chrome

OS

MacOS

Additional information

No response

Lee-Ju-Yeong avatar Sep 30 '24 06:09 Lee-Ju-Yeong