DocBank Is the correct inference method?

Hi, I use your released models with transformers and try to do the inference. However, the test results are not so good. So I wonder if my inference method is correct. During this process, I ran into a few problems:

The annotation bboxes should transfer into the tokens in your voc, but how to combine the tokens' labels to the bboxes' label?For example, "Hello" may be divided into "he" "llo", and their labels are "1" "8", then how to define the label of "hello"? I try to recover the label with the first token, as above, I use the "he"'s label "1" as the "hello"'s label. Is it correct?
For the document contains more the 512 tokens, for example 782, I divided into 512 and 270 independant input to the model, and concat the results. Is it correct?
For the "zero area" tokens, such as[23, 405, 23, 407], do you calculate the area? Thanks a lot for your attention, I'm looking forward to your reply.

Feb 03 '21 06:02 volcano1995

We use the label of the whole world as the label of the first token. The rest tokens are labeled by "CrossEntropyLoss().ignore_index" which will be ignored when computing the loss.
Yes.
We don't calculate the area. The "zero-area" tokens are included in the data.

Apr 16 '21 06:04 liminghao1630

@volcano1995 Hi, were you able to figure out the correct inference method that achieves similar accuracy as reported? I am also trying to run inference, and getting very low accuracy on the validation set.

Jul 05 '21 21:07 spencer-hong