PICK-pytorch icon indicating copy to clipboard operation
PICK-pytorch copied to clipboard

Not clear how to annotate my documents

Open wilfreddesert opened this issue 4 years ago • 2 comments

Hi @wenwenyu

I cannot wait to try your model with my data. It's actually quite a huge dataset with documents of various layouts for which I would like to extract a set of key/value pairs.

I have a few questions though regarding the format of data for training:

  • In your examples, annotations are for entities as a whole. If some_field's value consists of 4 words then you specify all the 4 words as the label.

Is this the only format possible? I use Google Vision API to create text annotations and this results in word-level entities so my initial idea was to label my data on a word-level. Will this not work for PICK?

Another question relates to one of the sample files: https://github.com/wenwenyu/PICK-pytorch/blob/master/data/data_examples_root/boxes_and_transcripts/X00016469623.tsv

As far as I understand from the description, the first column is id, but why do all the values in the first column equal 1 in that file?

Thanks!

wilfreddesert avatar Jan 11 '21 16:01 wilfreddesert

I have a similar problem, please let me know if you have found the solution to have PICK work with word-level annotations.

jianglong-he-Infrrd avatar Mar 01 '21 18:03 jianglong-he-Infrrd

Hi @wilfreddesert were you able to get answers to your question? Would really love to know about how did you deal with word entities.

nehasaraf1994 avatar Apr 14 '21 18:04 nehasaraf1994