PICK-pytorch icon indicating copy to clipboard operation
PICK-pytorch copied to clipboard

sroie results

Open juvebogdan opened this issue 4 years ago • 10 comments

Hello,

I trained your model on sroie. During training I got following:

| name | mEP | mER | mEF | mEA | +=========+==========+==========+==========+==========+ | company | 0.887363 | 0.904762 | 0.895978 | 0.904762 | +---------+----------+----------+----------+----------+ | address | 0.947084 | 0.950163 | 0.948621 | 0.950163 | +---------+----------+----------+----------+----------+ | total | 0.804009 | 0.897266 | 0.848081 | 0.897266 | +---------+----------+----------+----------+----------+ | date | 0.981878 | 0.996656 | 0.989212 | 0.996656 | +---------+----------+----------+----------+----------+ | overall | 0.900719 | 0.937126 | 0.918562 | 0.937126 |

But when I run it on test set I get pretty bad results. for example total is missing a lot. Looks like this for example;

company KAISON FURNISHING SDN BHD,company address L4-17 (B), LEVEL 4,address address UP2-01, MELAWATI MALL,address address 355, JALAN BANDAR MELAWATI,address address PUSAT BANDAR MELAWATI,address address 53100 KUALA LUMPUR.,address date 29-01-18 address 2,305.80 SR,other address 3 total ,33 address 6.00 SR,othe address 2,197.00 SR,other address 7,838.80,other address -7,840.00,other address 7,395.09,other address 7,838.80,other

This one is even on training set example.

juvebogdan avatar Jan 19 '21 17:01 juvebogdan

There is two problems there; first, you must remove the categories (last column) from the tsv input or pick will get confused; second, SROIE dataset has many transcript errors, the training and prediction end up very messed up because of them.

compadrejavo avatar Jan 19 '21 17:01 compadrejavo

I had the same problem of not removing the last column. I became desperate until I was able to realize it. I'm glad I wasn't the only one ... ahahahha

jorgerodriguezsj avatar Jan 19 '21 17:01 jorgerodriguezsj

Oh. Thank you. I tried removing it. But if I remove it just from tsv files then I am getting some errors just at the start of training. Do I remove this in actual tsv files during preprocess or somewhere else?

juvebogdan avatar Jan 19 '21 21:01 juvebogdan

@juvebogdan Be careful because you have to remove them only from those you use for inference. That is, only those that you pass to the test.py file.

jorgerodriguezsj avatar Jan 19 '21 22:01 jorgerodriguezsj

I understand. Thank you very much

juvebogdan avatar Jan 19 '21 22:01 juvebogdan

I think i need to change keys.txt file as well. Is this required?

juvebogdan avatar Jan 20 '21 09:01 juvebogdan

No, it is not necessary. Take a look at the arguments that test.py needs

  • Checkpoint
  • Boxes and transcripts (Without the tag column) of the images wich you want to get the info
  • Path of the folder in which are the images from which you want to get the information.
  • Path of the folder where you want to save the output results of each image
  • GPU id to use
  • Batch size

Therefore you only need the images and the boxes and transcripts (Without the tag column)

jorgerodriguezsj avatar Jan 20 '21 10:01 jorgerodriguezsj

@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only

+---------+----------+----------+----------+----------+
| name    |      mEP |      mER |      mEF |      mEA |
+=========+==========+==========+==========+==========+
| total   | 0.504762 | 0.550173 | 0.52649  | 0.550173 |
+---------+----------+----------+----------+----------+
| address | 0.60628  | 0.394035 | 0.47764  | 0.394035 |
+---------+----------+----------+----------+----------+
| company | 0.564706 | 0.571429 | 0.568047 | 0.571429 |
+---------+----------+----------+----------+----------+
| date    | 0.877551 | 0.914894 | 0.895833 | 0.914894 |
+---------+----------+----------+----------+----------+
| overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 |
+---------+----------+----------+----------+----------+

minhhoangbui avatar Apr 03 '21 03:04 minhhoangbui

@juvebogdan May I ask how you got such a high number? After 100 epochs, I got these numbers only

+---------+----------+----------+----------+----------+
| name    |      mEP |      mER |      mEF |      mEA |
+=========+==========+==========+==========+==========+
| total   | 0.504762 | 0.550173 | 0.52649  | 0.550173 |
+---------+----------+----------+----------+----------+
| address | 0.60628  | 0.394035 | 0.47764  | 0.394035 |
+---------+----------+----------+----------+----------+
| company | 0.564706 | 0.571429 | 0.568047 | 0.571429 |
+---------+----------+----------+----------+----------+
| date    | 0.877551 | 0.914894 | 0.895833 | 0.914894 |
+---------+----------+----------+----------+----------+
| overall | 0.610822 | 0.509991 | 0.555871 | 0.509991 |
+---------+----------+----------+----------+----------+

I suppose you should try early stop method

HoKinChung avatar May 06 '22 14:05 HoKinChung

I think you ended up with an overfitting problem, how many images did you use for train/test data ?

ziodos avatar May 06 '22 14:05 ziodos