drunkpig comments

Results 91 comments of


                                            drunkpig

多级标题

@shibainu-gbq 标题的形式太多了，段落间距，字体，颜色，粗细，背景都能决定是不是标题。很难有普世的方法。

--method ocr参数的作用是啥？什么场景下需要加这个参数？加这个参数代码片段会被识别成1行，不加的话正常识别原始格式

`--method ocr` means use paddle to get text from pdf, `--method text` means use pymuPDF to get text from pdf. The difference lies in that the bounding boxes obtained by...

--method ocr参数的作用是啥？什么场景下需要加这个参数？加这个参数代码片段会被识别成1行，不加的话正常识别原始格式

@freedom1993 We will document this phenomenon you reported as a bug and investigate the root cause.

--method ocr参数的作用是啥？什么场景下需要加这个参数？加这个参数代码片段会被识别成1行，不加的话正常识别原始格式

@freedom1993 can you provide me this pdf?

magic_doc有开源html静态页面转markdown功能吗？

https://github.com/opendatalab/magic-html 在这里

## make pdf index pdf indexes looks likes this: ```json { "track_id": "afeda417-5a33-4ec8-bd79-56222763f832", "path": "s3://mybook/pdf/book-name.pdf", "file_type": "pdf", "title": "My book Name", } ``` ## batch inference ```python if __name__ ==...

能不能转化doc成md啊？还是只能pdf转md

@Alan-zhong 使用libreoffice命令行，转换office格式到pdf,,然后处理 ```shell soffice --headless --convert-to pdf path/to/your/file.docx ```

File not found

```"models-dir":"~/tools/PDF-Extract-Kit/models/"``` ==> ```"models-dir":"/abs/path/to/tools/PDF-Extract-Kit/models/",```

文件名称包含空格导致文件生成失败

@strongerfly 产生了比较多的冲突，建议从dev分支下拉代码，修改并提交PR到dev分支，感谢。

[chore] udpate DockerFile to fix build bugs

@ProseGuys please commit code to dev branch.