drunkpig
drunkpig
try this project https://github.com/opendatalab/MinerU @DeathGanker
@famda try this project https://github.com/opendatalab/MinerU, this tool has a powerful layout detect model and a strong post-processor pipeline
@lowy-git The current document parsing does not support vertical text layout.
The current output is in LaTeX tables. We will support HTML output next week. Markdown tables will not be supported because they cannot handle cell merging.
@qwerasdfgioa The simplest way is to create a new environment: `conda create -n minerU python=3.10`
@dengtianmin Due to copyright issues with the training data, it cannot be made public. If needed, you can contact the Opendatalab assistant in the WeChat group for cooperation.
@1greatday you should disable ultralytics auto update. set env `NO_ALBUMENTATIONS_UPDATE=1` will solve this problem.
@1greatday ```python import os # set env os.environ['MY_ENV_VAR'] = 'some_value' ```
@lygiants Currently, native support for multithreading is not available, and additional development will be required.
@ynywna Actually, it is not supported. But you can set env `CUDA_VISIBLE_DEVICES={gpu_server.gpu_id}` to use multip GPUs