unilm icon indicating copy to clipboard operation
unilm copied to clipboard

How to annotate the own receipt images for layoutLM

Open SandyRSK opened this issue 4 years ago • 8 comments

I have download the SROIE dataset. It has Box, img, key folder Once the pre-processing is done.

I run !python "run_seq_labeling.py" --data_dir "output_receipt" --model_type layoutlm --model_name_or_path "model_LM/" --do_lower_case --max_seq_length 512 --do_train --num_train_epochs 2 --logging_steps 10 --save_steps -1 --output_dir "out_model_receipt/" --overwrite_output_dir --labels "labels_receipt.txt" --per_gpu_train_batch_size 16 --per_gpu_eval_batch_size 16 --fp16

Error occurred is :

File "/content/drive/My Drive/unilm/layoutlm/examples/seq_labeling/run_seq_labeling.py", line 811, in main() File "/content/drive/My Drive/unilm/layoutlm/examples/seq_labeling/run_seq_labeling.py", line 701, in main args, tokenizer, labels, pad_token_label_id, mode="train" File "/content/drive/My Drive/unilm/layoutlm/examples/seq_labeling/layoutlm/data/funsd.py", line 72, in init self.all_bboxes = torch.tensor([f.boxes for f in features], dtype=torch.long) ValueError: expected sequence of length 4 at dim 2 (got 8)

SandyRSK avatar Sep 12 '20 06:09 SandyRSK

@SandyRSK Basically, you may need pre-process the SROIE dataset into token-level and fed the data into LayoutLM.

wolfshow avatar Sep 15 '20 05:09 wolfshow

@wolfshow

After all the preprocessing of my own receipt dataset, I can able to train. But in the prediction stage, after running the --do predict program I can able to see the text_prediction.txt file but it shows only top 16. I don't know Why it is not detecting the other?

SandyRSK avatar Oct 07 '20 10:10 SandyRSK

Mabe it helps: https://github.com/ruifcruz/sroie-on-layoutlm

ruifcruz avatar Nov 25 '20 00:11 ruifcruz

I have some invoice dataset that I want to annotate in order to run it through layoutlm but the problem is where should I annotate it, I couldn't find a tool which takes in a image of document allow me to annotate it and return me the text files that I can further feed into layoutlm. I tried UBIAI but that tool is paid. Could anyone suggest me something similar to ubiai?

burhanuddin03 avatar May 17 '22 13:05 burhanuddin03

I have some invoice dataset that I want to annotate in order to run it through layoutlm but the problem is where should I annotate it, I couldn't find a tool which takes in a image of document allow me to annotate it and return me the text files that I can further feed into layoutlm. I tried UBIAI but that tool is paid. Could anyone suggest me something similar to ubiai?

Hi, any progress on annotation tools? also tried but coundn't find others except UBIAI...

Irisnotiris avatar May 30 '22 06:05 Irisnotiris

I am

I have some invoice dataset that I want to annotate in order to run it through layoutlm but the problem is where should I annotate it, I couldn't find a tool which takes in a image of document allow me to annotate it and return me the text files that I can further feed into layoutlm. I tried UBIAI but that tool is paid. Could anyone suggest me something similar to ubiai?

Hi, any progress on annotation tools? also tried but coundn't find others except UBIAI...

Have u found any such labeling tool? Even I'm looking for the same

Rajeshwar21 avatar Jul 15 '22 06:07 Rajeshwar21

I am

I have some invoice dataset that I want to annotate in order to run it through layoutlm but the problem is where should I annotate it, I couldn't find a tool which takes in a image of document allow me to annotate it and return me the text files that I can further feed into layoutlm. I tried UBIAI but that tool is paid. Could anyone suggest me something similar to ubiai?

Hi, any progress on annotation tools? also tried but coundn't find others except UBIAI...

Have u found any such labeling tool? Even I'm looking for the same

No, there is a option on UBIAI to negotiate on the budget. I used that .

burhanuddin03 avatar Jul 15 '22 08:07 burhanuddin03

I have some invoice dataset that I want to annotate in order to run it through layoutlm but the problem is where should I annotate it, I couldn't find a tool which takes in a image of document allow me to annotate it and return me the text files that I can further feed into layoutlm. I tried UBIAI but that tool is paid. Could anyone suggest me something similar to ubiai?

I am not sure it is full related or not but I found Layout Parser and Label Studio.

Link: https://www.youtube.com/watch?v=puOKTFXRyr4&ab_channel=LabelStudio

ammarsaf avatar Mar 27 '23 02:03 ammarsaf