LiLT icon indicating copy to clipboard operation
LiLT copied to clipboard

The difference of results on the SER of FUNSD between the Table 2 and the Table 6 in the paper.

Open WenjinW opened this issue 1 year ago • 2 comments

Hi! I like the idea of decoupling text and layout information to leverage existing pre-trained language models. I had some confusion when I was reading the paper.

  • Table 2 shows that the F1 of LiLT[InfoXLM] on the SER task of FUNSD is 0.8586. image
  • However, in Table 6, the F1 is 0.8415. image

Why are the performances reported in the two tables different?

Thanks for your reply.

WenjinW avatar Aug 18 '22 13:08 WenjinW

Hi, I had the same confusion but I think the results you mention for Table 2 are for the multilingual model, while the ones in Table 3 are English only.

leitouran avatar Aug 23 '22 20:08 leitouran

Hi, @WenjinW @leitouran , Table 2 follows the fine-tune style of https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_funsd.py, but Table 3 follows the fine-tune style of https://github.com/microsoft/unilm/blob/master/layoutlmft/examples/run_xfun_ser.py.

jpWang avatar Aug 27 '22 05:08 jpWang