dify
dify copied to clipboard
local variable 'docs' referenced before assignment
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.6.1
Cloud or Self Hosted
Self Hosted (Source)
Steps to reproduce
upload some csv file when create a knowleage
✔️ Expected Behavior
Document status is Available
❌ Actual Behavior
Document status is Error
[2024-04-10 08:36:58,393: ERROR/MainProcess] consume document failed Traceback (most recent call last): File "/app/api/core/indexing_runner.py", line 64, in run text_docs = self._extract(index_processor, dataset_document, processing_rule.to_dict()) File "/app/api/core/indexing_runner.py", line 362, in _extract text_docs = index_processor.extract(extract_setting, process_rule_mode=process_rule['mode']) File "/app/api/core/rag/index_processor/processor/paragraph_index_processor.py", line 21, in extract text_docs = ExtractProcessor.extract(extract_setting=extract_setting, File "/app/api/core/rag/extractor/extract_processor.py", line 129, in extract return extractor.extract() File "/app/api/core/rag/extractor/csv_extractor.py", line 53, in extract return docs UnboundLocalError: local variable 'docs' referenced before assignment
Hey @liuzhenghua! Long time, no see. Hope you're doing well and coding up a storm!
The error you're encountering is due to the docs
variable being referenced before it's assigned any value in the extract
method of the CSV extractor. To fix this, you should initialize docs
at the start of the method. Here's a corrected version of the extract
method:
def extract(self) -> list[Document]:
docs = [] # Initialize docs here
try:
with open(self._file_path, newline="", encoding=self._encoding) as csvfile:
docs = self._read_from_file(csvfile)
except UnicodeDecodeError as e:
if self._autodetect_encoding:
detected_encodings = detect_file_encodings(self._file_path)
for encoding in detected_encodings:
try:
with open(self._file_path, newline="", encoding=encoding.encoding) as csvfile:
docs = self._read_from_file(csvfile)
break
except UnicodeDecodeError:
continue
else:
raise RuntimeError(f"Error loading {self._file_path}") from e
return docs
By initializing docs
before any operations, you ensure it's always defined, which should resolve the UnboundLocalError
and allow the document status to become "Available" after uploading a CSV file. This modification should be made in the CSV extractor file located at api/core/rag/extractor/csv_extractor.py
.
If you need further assistance or have more questions, feel free to ask.
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
thanks for you feedback,the logic on our side is indeed flawed, and we have fixed it, but for your csv file, you need to note that the file encoding is utf-8 @liuzhenghua