PICK-pytorch Preparing tsv file for custom dataset

Preparing tsv file for custom dataset

Open prabhakar-sivanesan opened this issue 4 years ago • 3 comments

Hi, firstly thanks for the model it worked perfectly good on the custom dataset. But I have two doubts in preparing the tsv data for training.

When I have 3 words associated to one entity, does all the three words has to seperatly annotated in tsv file or they have to be combined into one ?

Example, this is the data

sample

In shipping address column, Kothuri Sai Kiran is a name. My OCR model gives these 3 words separatly as Kothuri, Sai and Kiran. So while preparing the tsv file, can I annotate it as 3 different row like this,

18,1009,490,1198,490,1198,553,1009,553,Kothuri,name 19,1206,495,1501,495,1501,552,1206,552,Sai,name 20,1619,501,1707,501,1707,560,1619,560,Kiran,name

or all three words has to be combined like this,

18,1009,490,1707,501, 1707,560,1009,553, Kothuri Sai Kiran, name

When you see the Billing address column, I have the same name Kothuri Sai Kiran. Is it possible to tag this name to the same entity "name" ? In a nut shell, Can I have multiple ocr data tagged to one entity for a single image file ?

Looking forward to your response.

Dec 28 '20 15:12 prabhakar-sivanesan

@prabhakar-sivanesan : Is it detecting all the entity in your custom dataset? How many data samples did you pass to the model to get the better result?

Jan 06 '21 05:01 ninjakx

@ninjakx I was training for only 5 entities and I used about 70 samples with 70/30 split. I was able to get better results for that.

Jan 30 '21 04:01 prabhakar-sivanesan

@prabhakar-sivanesan Hi Prabhakar, would you let me know which annotation tool you used for preparing the custom dataset?

Jun 10 '21 12:06 Nivedita-mahato2

PICK-pytorch PICK-pytorch copied to clipboard

Preparing tsv file for custom dataset

PICK-pytorch
PICK-pytorch copied to clipboard