donut
donut copied to clipboard
custom json schema - ASAP
is it possible to train the model to generate structured output with custom json schema? Pleases help me ASAP
@felixvor
It would be very interesting to see how a complicated json structure can impact the model performance, but to make it short: Sure it is possible, you can pretty much fine tune the model to generate any text you want!
In pre-training the model only learns to generate OCR text strings based on images (no json at all), the example notebooks then use the pre-trained weights to fine-tune the model on various schemas (including json) for classification, entity extraction and question answering. I would recommend to follow the conventions of converting your json into an html-like structure and converting from/to json before/after calling the model but all of that is covered in the examples as well.
Good luck with your experiments, keep us posted about your results!