MetaGPT icon indicating copy to clipboard operation
MetaGPT copied to clipboard

For some reason ragpipeline stopped working out of nowhere TypeError: SimpleHybridRetriever._aretrieve() missing 1 required positional argument: 'query'

Open cranyy opened this issue 1 year ago • 6 comments

  File "E:\Project\22222\MetaStocky\ooo.py", line 100, in query
    response = await asyncio.wait_for(self._engine.aquery(f"{OPTIONS_ANALYSIS_PROMPT}\n{question}"), timeout=120)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\marin\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 307, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 65, in aquery
    query_result = await self._aquery(str_or_query_bundle)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 307, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 204, in _aquery
    nodes = await self.aretrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\22222\MetaStocky\metagpt\rag\engines\simple.py", line 168, in aretrieve
    nodes = await super().aretrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 148, in aretrieve
    nodes = await self._retriever.aretrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\instrumentation\dispatcher.py", line 307, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\base\base_retriever.py", line 276, in aretrieve
    nodes = await self._aretrieve(query_bundle=query_bundle)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SimpleHybridRetriever._aretrieve() missing 1 required positional argument: `'query'`



This started happening literally today. I havent changed anything ecept update to the newest version, and it worked fine two - three days ago when I last tested

cranyy avatar Apr 24 '24 01:04 cranyy

Is it reproducible, could you provide more details, such the code that was executed.

seehi avatar Apr 24 '24 02:04 seehi

Hello, I solved it - it seems the dependencies for the RAG were not updated properly. But, now the bigger issue is that it gives this -

2024-04-24 05:40:33.851 | ERROR    | __main__:query:108 - Traceback (most recent call last):
  File "E:\Project\MetaStocky\ooo.py", line 100, in query
    response = await asyncio.wait_for(self._engine.aquery(f"{OPTIONS_ANALYSIS_PROMPT}\n{question}"), timeout=120)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\marin\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 479, in wait_for
    return fut.result()
           ^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\base\base_query_engine.py", line 46, in aquery
    return await self._aquery(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 201, in _aquery
    nodes = await self.aretrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\metagpt\rag\engines\simple.py", line 175, in aretrieve
    nodes = await super().aretrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 147, in aretrieve
    return self._apply_node_postprocessors(nodes, query_bundle=query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\query_engine\retriever_query_engine.py", line 136, in _apply_node_postprocessors
    nodes = node_postprocessor.postprocess_nodes(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\postprocessor\types.py", line 55, in postprocess_nodes
    return self._postprocess_nodes(nodes, query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\postprocessor\llm_rerank.py", line 99, in _postprocess_nodes
    raw_choices, relevances = self._parse_choice_select_answer_fn(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\indices\utils.py", line 104, in default_parse_choice_select_answer_fn
    answer_num = int(line_tokens[0].split(":")[1].strip())
                     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range when    

I remember that I had fixed it somehow in the utils.py from the llama_core package

def default_parse_choice_select_answer_fn(
    answer: str, num_choices: int, raise_error: bool = False
) -> Tuple[List[int], List[float]]:
    """Default parse choice select answer function."""
    answer_lines = answer.split("\n")
    answer_nums = []
    answer_relevances = []
    for answer_line in answer_lines:
        line_tokens = answer_line.split(",")
        if len(line_tokens) != 2:
            if not raise_error:
                continue
            else:
                raise ValueError(
                    f"Invalid answer line: {answer_line}. "
                    "Answer line must be of the form: "
                    "answer_num: <int>, answer_relevance: <float>"
                )
        answer_num = int(line_tokens[0].split(":")[1].strip())
        if answer_num > num_choices:
            continue
        answer_nums.append(answer_num)
        # extract just the first digits after the colon.
        _answer_relevance = re.findall(r"\d+", line_tokens[1].split(":")[1].strip())[0]
        answer_relevances.append(float(_answer_relevance))
    return answer_nums, answer_relevances

But after updating I cant remember what the issue was or how I had fixed it. I am prompting the engine with quite the long thing so thats what is maybe causing its malformed response.

cranyy avatar Apr 24 '24 02:04 cranyy

This is because the answer from the LLM is incorrect. What is your model? prefer gpt-4-turbo.

seehi avatar Apr 24 '24 02:04 seehi

I am using gpt-4-turbo-preview which I have always been using it and it used to work properly because i remember fixing this once and i remember that this specific code caused the issue as the answer could sometimes be a weird problematic response. --- And after some testing I remember that the issue is when the answer from the query provides a list with colons in it it screws up the process, so it doesnt matter what LLM you are using, if it returns a Value1: key1, key2 key then value2: key, key key it breaks and leads to an empty response INFO | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:97 - Processing answer line: - **OI**: 22 2024-04-24 06:54:39.293 | WARNING | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:100 - Invalid answer line format: - **OI**: 22 2024-04-24 06:54:39.294 | INFO | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:97 - Processing answer line: - **Volume**: 140 2024-04-24 06:54:39.295 | WARNING | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:100 - Invalid answer line format: - **Volume**: 140 2024-04-24 06:54:39.295 | INFO | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:97 - Processing answer line: - **Mark**: 5.365 2024-04-24 06:54:39.296 | WARNING | llama_index.core.indices.utils:default_parse_choice_select_answer_fn:100 - Invalid answer line format: - **Mark**: 5.365 I cant be bothered to fix it in the code now so i just hardcoded my prompt to never use ":"

cranyy avatar Apr 24 '24 04:04 cranyy

Actually my hardcoing never to use ":" doesnt work at all it seems now that i actually test it. The issue is when it creates multiple lines from a single answer. I.e. creates multiple answers for a single query and it always fucks up the process then. And i cant for the life of me remember how to fix it now. I keep getting either empty responses or File "E:\Project\MetaStocky\env\Lib\site-packages\llama_index\core\indices\utils.py", line 104, in default_parse_choice_select_answer_fn answer_num = int(line_tokens[0].split(":")[1].strip()) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^ IndexError: list index out of range --- regardless of what LLM i use

cranyy avatar Apr 24 '24 12:04 cranyy

When using an LLM to rerank, its not always guaranteed that the output will be parseable for reranking. You can print the result of answer_line, which

Correct Format ['Doc: 1', ' Relevance: 8']

Wrong Format ['Based on the text'], [' here are some things'] ['3. First of all: The ability'], [' as they need to be']

seehi avatar May 06 '24 07:05 seehi

Due to the lack of updates or replies by the user for a long time, we will close it. Please reopen it if necessary.

better629 avatar Oct 10 '24 02:10 better629