unilm icon indicating copy to clipboard operation
unilm copied to clipboard

Multiple pages documents for LayoutLM

Open victor-ab opened this issue 4 years ago • 6 comments

My question is regarding LayoutLM. I want to apply something like the Receipt Understanding task to multiple-pages documents, what is the treatment? Is there?

victor-ab avatar Aug 31 '20 18:08 victor-ab

@victor-ab The easy way is that you may split the multi-page documents into a set of blocks, which can be fed into the LayoutLM model.

wolfshow avatar Sep 11 '20 01:09 wolfshow

@wolfshow what do you mean by "set of blocks"? I did not get it.

I had the idea of "concatenating" vertically all the pages. But I guess this is not the best solution, as the text will get much more dense than with just one page.

victor-ab avatar Sep 11 '20 04:09 victor-ab

Hey @victor-ab, greetings!

Did you find a way out which is more close to achieving expected accuracy for multipage documents?

khushbu-mulani avatar Oct 05 '20 14:10 khushbu-mulani

Hi @khushbu-mulani ! Not yet. Please let me know if you have any ideas.

On Mon, 5 Oct 2020, 11:26 khushbu-mulani, [email protected] wrote:

Hey @victor-ab https://github.com/victor-ab, greetings!

Did you find a way out which is more close to achieving expected accuracy for multipage documents?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/unilm/issues/232#issuecomment-703667644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGKYMCQWHY5HUOP26ENURSLSJHJRLANCNFSM4QQ2R7PA .

victor-ab avatar Oct 05 '20 15:10 victor-ab

Hi @wolfshow, Can you suggest how do we deal with multiple page documents for training and for inferencing?

For training, we can either get hocr for each of the page separately OR we can combine all the pages of document and get single hocr file? But how does this work while inferencing?

Thanks in advance!

khushbu-mulani avatar Oct 06 '20 11:10 khushbu-mulani

Hi everyone. Did anyone figure out this?

lumalav avatar Feb 27 '24 16:02 lumalav