Langchain-Chatchat
Langchain-Chatchat copied to clipboard
[BUG] 加载文档时出错:cannot render metafile
问题描述 / Problem Description 用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.
复现问题的步骤 / Steps to Reproduce startup -a
- 执行 '...' / Run '...'
- 点击 '...' / Click '...'
- 滚动到 '...' / Scroll to '...'
- 问题出现 / Problem occurs
预期的结果 / Expected Result 描述应该出现的结果 / Describe the expected result.
实际结果 / Actual Result ERROR: OSError: 从文件 LIN/1 秋天02.pptx 加载文档时出错:cannot render metafile RapidOCRPPTLoader slide index: 4: 11%|█ | 3/27 [00:06<00:50, 2.09s/it] 描述实际发生的结果 / Describe the actual result.
环境信息 / Environment Information
- langchain-ChatGLM 版本/commit 号:(例如:v2.0.1 或 commit 123456) / langchain-ChatGLM version/commit number: (e.g., v2.0.1 or commit 123456)
- 是否使用 Docker 部署(是/否):是 / Is Docker deployment used (yes/no): no
- 使用的模型(ChatGLM2-6B / Qwen-7B 等):ChatGLM-6B / Model used (ChatGLM2-6B / Qwen-7B, etc.): ChatGLM3-6B
- 使用的 Embedding 模型(moka-ai/m3e-base 等):moka-ai/m3e-base / Embedding model used (moka-ai/m3e-base, etc.): moka-ai/m3e-base
- 使用的向量库类型 (faiss / milvus / pg_vector 等): faiss / Vector library used (faiss, milvus, pg_vector, etc.): milvus
- 操作系统及版本 / Operating system and version:
- Python 版本 / Python version: 3.10
- 其他相关环境信息 / Other relevant environment information:
附加信息 / Additional Information 添加与问题相关的任何其他信息 / Add any other information related to the issue.
The error "cannot render metafile" you're encountering with the document loading from "LIN/1 秋天02.pptx" might be due to the handling of non-ASCII characters in the file path. Here are specific steps to address this issue:
- Ensure ASCII-only File Paths: Move or rename the file and its path to use only ASCII characters. Non-ASCII characters in paths can cause issues with file handling in some libraries.
- Use Absolute Paths: Providing an absolute path to the file can sometimes circumvent issues arising from special characters in file paths.
- Check File Path Encoding: Make sure your script explicitly handles file paths with the correct encoding, typically UTF-8 in Python.
If these steps don't resolve the problem, it could be an underlying issue with the libraries (pptx, PIL, etc.) and their compatibility with non-ASCII paths in your environment. Checking their documentation or issues section may provide more targeted solutions.
Regarding the logging mechanism in Langchain-Chatchat, it's set up in configs/basic_config.py.example with a detailed format that includes timestamps, filenames, line numbers, log levels, and messages. This setup aims to provide comprehensive error diagnostics. The log level defaults to INFO, but can be adjusted for more verbosity. Logs are stored in a directory specified by LOG_PATH. For environment-specific issues, ensure all dependencies and environment settings, as outlined in the Dockerfile, are correctly configured. This includes ensuring compatibility with the NVIDIA CUDA environment, the libgl1 library, and the Python environment.
For further assistance, please provide more details such as the exact command executed, UI interactions, the version or commit number of langchain-ChatGLM, and your operating system version. This information will help in pinpointing the issue more accurately.
Details
Sources
- document_loaders/mypptloader.py
- knowledge_base/samples/content/test_files/langchain-ChatGLM_closed.jsonl
- configs/basic_config.py.example
- knowledge_base/samples/content/test_files/langchain-ChatGLM_open.jsonl
- Dockerfile
About Dosu This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
To continue the conversation, mention @dosu.
应该是ocr没有正确识别ppt问题,这个bug是paddle ocr的应该,word txt这些正常吗
应该是ocr没有正确识别ppt问题,这个bug是paddle ocr的应该,word txt这些正常吗
目前就试了 pdf txt ppt 图片 这些都正常 这个问题只是个别ppt添加到向量库报错而已