unilm icon indicating copy to clipboard operation
unilm copied to clipboard

[LayoutLM] Do you plan to open source the receipt understanding task example for the SROIE dataset?

Open oni-on opened this issue 4 years ago • 6 comments

It'd be awesome to be able to reproduce the results obtained on the SROIE challenge.

oni-on avatar Oct 02 '20 13:10 oni-on

feeling the same.

jackie930 avatar Nov 11 '20 06:11 jackie930

same here. I am getting very different results from the ones presented in the paper..!

ruifcruz avatar Nov 12 '20 09:11 ruifcruz

Mabe it helps: https://github.com/ruifcruz/sroie-on-layoutlm

ruifcruz avatar Nov 25 '20 00:11 ruifcruz

Great work @ruifcruz ! I'm trying to run your notebook atm.

oni-on avatar Nov 25 '20 14:11 oni-on

Mabe it helps: https://github.com/ruifcruz/sroie-on-layoutlm

I see, that this notebook directly uses the OCR annotations provided with the original dataset. Do we know what OCR engine was used by the original authors for the SROIE information extraction task? The LayoutLMv2 paper mentions that they "use the official OCR annotations ". Does that mean no OCR was performed and the annotations were directly used?

SuryaThiru avatar May 02 '23 16:05 SuryaThiru

As far as I remember (some time have passed since then), they have used tesseract (in v1). I would say that they didn't need to OCR because they already had the annotations from the contest.

ruifcruz avatar May 03 '23 18:05 ruifcruz