ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: How to fine-tune deepdoc on additional data

Open q0oz opened this issue 1 year ago • 2 comments

Describe your problem

Hi there!

I've been trying to use deepdoc (mainly layout) functionality for predicting the structure of scientific PDFs. The quality of the recognition was not satisfactory, so I thought that additional training on our data might help. Is it possible to do that?

Additional questions: is it possible to see the training code for deepdoc and to know what data you used for training?

Thank you!

q0oz avatar Apr 26 '24 20:04 q0oz

Describe your problem

Hi there!

I've been trying to use deepdoc (mainly layout) functionality for predicting the structure of scientific PDFs. The quality of the recognition was not satisfactory, so I thought that additional training on our data might help. Is it possible to do that?

Additional questions: is it possible to see the training code for deepdoc and to know what data you used for training?

Thank you!

We used public data like CDLA and PubTables to train our model. We will open our trainning code in the feature.

KevinHuSh avatar Apr 28 '24 03:04 KevinHuSh

It is very good to hear that your team will share your training code about deepdoc. Thank you.

nhha1602 avatar May 11 '24 05:05 nhha1602