MinerU icon indicating copy to clipboard operation
MinerU copied to clipboard

如何对layout模型进行微调

Open dengtianmin opened this issue 1 year ago • 2 comments

有些pdf分块有一点问题,如何使用私有数据进行微调 image 正确分块如下: image

dengtianmin avatar Aug 13 '24 09:08 dengtianmin

layout识别看着没问题,这个是后处理算法问题,如果有能力二次开发的话直接对算法部分进行修改就行。 layout核心算法是这个 https://github.com/opendatalab/MinerU/blob/4983bc1df668b80fa3481fa657eb509b448bb082/magic_pdf/pdf_parse_union_core.py#L152 能解决这种多栏排版的话也欢迎提pr。

myhloli avatar Aug 13 '24 09:08 myhloli

@dengtianmin Due to copyright issues with the training data, it cannot be made public. If needed, you can contact the Opendatalab assistant in the WeChat group for cooperation.

drunkpig avatar Aug 15 '24 02:08 drunkpig