PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

Related to semantic entity relation

Open Dineshkumar-Anandan-ZS0367 opened this issue 10 months ago • 5 comments

The semantic entity relation model works fine, some key value pair in documents are predicted as only answer, how to fix this issue. How to properly identified questions and answers for healthcare documents.

  1. Is there any options for SER tokenizer.
  2. Any options to finetune that code.
  3. Is there any preprocess work need for this predictions.

Can you give a more detailed example? Based solely on what you mentioned in your question, there is a scenario where a key points to multiple values in your data, right? If so, you need to check if your GT is correctly associated with the KV relationship, and briefly calculate the proportion of this scenario in the entire dataset, and try to increase it as much as possible.

UserWangZz avatar Apr 25 '24 02:04 UserWangZz

1 4

Please look into this document, for ex, patient name is key and pamela wood is a answer

Did you use the official model for inference? Have you used the data from the current document for fine-tuning the model?

UserWangZz avatar Apr 26 '24 01:04 UserWangZz

Yes i am using this official paddleocr model for english.

Now that is a default model, i can't finetune the model.

Can you please share some ideas or anything about this problem

You can refer to the following document to fine tune the official model to fit your data, including data preparation, starting training, and so on. Chinese document address: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/kie.md English document address: https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_en/kie_en.md

UserWangZz avatar Apr 28 '24 01:04 UserWangZz