MinerU
MinerU copied to clipboard
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
[07/21 22:09:19 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loa ding from D:\github_project\backend_code_generation\models\Layout/model_final.pth ... [07/21 22:09:19 fvcore.common.checkpoint]: [Checkpointer] Loading from d:\github_project\backend_code_generation\models\Layout/model_final.pth ... 2024-07-21 22:09:20.946 | INFO | magic_pdf.model.pdf_extract_kit:__init__:125 - DocAnalysis init done! 2024-07-21 22:09:20.950 |...
### Description of the bug | 错误描述 当我修改json文件中"device-mode":"cuda" 运行magic-pdf pdf-command --pdf "1.pdf" --inside_model true 仍然提示 magic_pdf.model.pdf_extract_kit:__init__:100 - using device: cpu ### How to reproduce the bug | 如何复现 1 ###...
1.离线部署首次运行,报错urllib.error.URLError: 首次运行需要在线下载一个小的语言检测模型,如果是离线部署需要手动下载该模型并放到指定目录。 参考:https://github.com/opendatalab/MinerU/issues/121 首次运行时,内部的一些模块可能需要联网环境以下载一些小模型资源,看了您的报错日志,是fast_langdetect需要下载一个语言检测用的模型文件,如您的机器不能联网,请将附件中压缩包内容解压到"/tmp"目录下 [fasttext-langdetect.zip](https://github.com/user-attachments/files/16082178/fasttext-langdetect.zip) 参考: https://github.com/LlmKira/fast-langdetect 请问fast-langdetect文件夹的存放地址,是否改成magic-pdf.json里面一样的可配置地址吗?
### Description of the bug | 错误描述 distutils.errors.CompileError: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a...
### Description of the bug | 错误描述 Word "Markdown" is misspelled. ![image](https://github.com/user-attachments/assets/f5cba56f-e652-49f5-b124-753b962164da) ### How to reproduce the bug | 如何复现 - ### Operating system | 操作系统 Linux ### Python version...
### Description of the bug | 错误描述 ![image](https://github.com/user-attachments/assets/363f9ffe-bff4-4d3d-a5c2-9de96e8b2dcb) ![image](https://github.com/user-attachments/assets/03851ead-cca4-495e-9fb1-fd57e774fc64) 如上图,在 PDF 文档里,如果一行文本的最后一个单词分在两行显示,会在行尾加上 '-' 连接符号。 转换成 Markdown 之后,'-' 连接符号依然纯在,单词被 '-' 加一个空白分开。 ### How to reproduce the bug | 如何复现 可以使用这个 https://arxiv.org/pdf/2407.01906...
**Is your feature request related to a problem? Please describe.** **您的特性请求是否与某个问题相关?请描述。** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 对存在的问题进行清晰且简洁的描述。例如:我一直很困扰的是 [...] **Describe...
### Description of the bug | 错误描述 报错:Required dependency not installed, please install by "pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/" ### How to reproduce the bug | 如何复现 macos、linux按照文档安装都会报这个错 ###...
### Description of the bug | 错误描述 - Required dependency not installed, please install by "pip install magic-pdf[full-cpu] detectron2 --extra-index-url https://myhloli.github.io/wheels/" 实际上我已经执行了这个安装 ### How to reproduce the bug | 如何复现...