dify icon indicating copy to clipboard operation
dify copied to clipboard

local variable 'docs' referenced before assignment

Open liuzhenghua opened this issue 10 months ago • 3 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] Pleas do not modify this template :) and fill in all the required fields.

Dify version

0.6.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

upload some csv file when create a knowleage

✔️ Expected Behavior

Document status is Available

❌ Actual Behavior

Document status is Error

liuzhenghua avatar Apr 10 '24 09:04 liuzhenghua

[2024-04-10 08:36:58,393: ERROR/MainProcess] consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 64, in run text_docs = self._extract(index_processor, dataset_document, processing_rule.to_dict()) File "/app/api/core/indexing_runner.py", line 362, in _extract text_docs = index_processor.extract(extract_setting, process_rule_mode=process_rule['mode']) File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 21, in extract text_docs = ExtractProcessor.extract(extract_setting=extract_setting, File "/app/api/core/rag/extractor/extract_processor.py", line 129, in extract return extractor.extract() File "/app/api/core/rag/extractor/csv_extractor.py", line 53, in extract return docs UnboundLocalError: local variable 'docs' referenced before assignment

liuzhenghua avatar Apr 10 '24 09:04 liuzhenghua

Hey @liuzhenghua! Long time, no see. Hope you're doing well and coding up a storm!

The error you're encountering is due to the docs variable being referenced before it's assigned any value in the extract method of the CSV extractor. To fix this, you should initialize docs at the start of the method. Here's a corrected version of the extract method:

def extract(self) -> list[Document]:
    docs = []  # Initialize docs here
    try:
        with open(self._file_path, newline="", encoding=self._encoding) as csvfile:
            docs = self._read_from_file(csvfile)
    except UnicodeDecodeError as e:
        if self._autodetect_encoding:
            detected_encodings = detect_file_encodings(self._file_path)
            for encoding in detected_encodings:
                try:
                    with open(self._file_path, newline="", encoding=encoding.encoding) as csvfile:
                        docs = self._read_from_file(csvfile)
                    break
                except UnicodeDecodeError:
                    continue
        else:
            raise RuntimeError(f"Error loading {self._file_path}") from e

    return docs

By initializing docs before any operations, you ensure it's always defined, which should resolve the UnboundLocalError and allow the document status to become "Available" after uploading a CSV file. This modification should be made in the CSV extractor file located at api/core/rag/extractor/csv_extractor.py.

If you need further assistance or have more questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] avatar Apr 10 '24 09:04 dosubot[bot]

thanks for you feedback,the logic on our side is indeed flawed, and we have fixed it, but for your csv file, you need to note that the file encoding is utf-8 @liuzhenghua

JohnJyong avatar Apr 10 '24 18:04 JohnJyong