drunkpig
drunkpig
@yzztin We have received your sample, thank you for sharing.
@xiabo0816 What is a `横版` PDF, and can you provide a sample file?
@hzzheng0612 I think Markdown is just plain text. Could you describe your requirements in more detail?
@tabatabaei We have received your request, and it will take some time.
@weihanfeng Take a look at this project : https://github.com/magicyuan876/mineru-server
@zuanzuanshao 不建议wsl参与大规模提取,wsl无论做任何事情卡死是常态。
@HakaishinShwet You are right, this project was developed for the production of high-quality corpora. Whether it's for the pre-training corpora of large models or for RAG applications, the MinerU project...
@luohao123 你好,文档开源涉及到的版权关系太复杂,开放有风险。如果有这方面的资源可以找我们合作,联合开源。
@luohao123 The collaboration is based on each party's respective strengths and does not have any commercial attributes. So you can simply consider it as free.
群主可以为我这个项目加个链接吗,目前大模型发展很快非常需要这种数据用于提高模型性能。