I need help about customize entities of SROIE dataset
Hello, firstly thank your for support in advance.
I would like to expand SROIE entities by using my own dataset. is it possible? Example: I would like to change as following array
SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"]
SROIE_CLASS_LIST = ["others", "company", "date", "time", "address", "total", "tax", "sub_total"] etc...
Yes, it is possible. The main modification lies in the number of categories and the corresponding mappings. Change the SROIE_CLASS_LIST, TAG_TO_IDX, and TAG_TO_IDX_BIO in train_SROIE.py and eval_SROIE.py to your custom entity type, then change the num_classes term in the config yaml file. You may also need to modify the postprocessing rules in eval_SROIE.py accordingly.
Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample
SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"]
TAG_TO_IDX = {
"O": 0,
"B-company": 1,
"B-date": 2,
"B-address": 3,
"B-total": 4,
}
TAG_TO_IDX_BIO = {
"O": 0,
"B-company": 1,
"I-company": 2,
"B-date": 3,
"I-date": 4,
"B-address": 5,
"I-address": 6,
"B-total": 7,
"I-total": 8,
}
And one more question.
I have to use entities for training SORIE's entities as following
{
"company": "BOOK TA .K (TAMAN DAYA) SDN BHD",
"date": "25/12/2018",
"address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.",
"total": "9.00"
}
**or just can I use only box and scripts file without entities **
1,83,41,331,41,331,78,83,78,TAN WOON YANN,other
1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company
1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other
1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address
1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address
1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address
1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other
1,177,295,266,295,266,317,177,317,-INVOICE-,other
1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other
1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other
1,16,382,156,382,156,402,16,402,9555916500133,other
Thank you very much for your very fast answer. But I did not understand how modify B- or I- tag. Can you modify for me, according to my expand sample
SROIE_CLASS_LIST = ["others", "company", "date", "address", "total"] TAG_TO_IDX = { "O": 0, "B-company": 1, "B-date": 2, "B-address": 3, "B-total": 4, } TAG_TO_IDX_BIO = { "O": 0, "B-company": 1, "I-company": 2, "B-date": 3, "I-date": 4, "B-address": 5, "I-address": 6, "B-total": 7, "I-total": 8, }
For example, if your entity types are [others, type1, type2, type3], the corresponding IDX maps should be:
TAG_TO_IDX = {
"O": 0, # Remember to keep the background type (others, or O tag) as the first term
"B-type1": 1,
"B-type2": 2,
"B-type3": 3,
}
TAG_TO_IDX_BIO = {
"O": 0, # Remember to keep the background type (others, or O tag) as the first term
"B-type1": 1,
"I-type1": 2,
"B-type2": 3,
"I-type2": 4,
"B-type3": 5,
"I-type3": 6,
}
You may also use the following codes to generate the corresponding mappings:
SROIE_CLASS_LIST = ["others", "company", "date", "time", "address", "total", "tax", "sub_total"]
TAG_TO_IDX_ = ["O"]
TAG_TO_IDX_BIO_ = ["O"]
for cls_type in SROIE_CLASS_LIST[1:]:
TAG_TO_IDX_.append(f"B-{cls_type}")
TAG_TO_IDX_BIO_.append(f"B-{cls_type}")
TAG_TO_IDX_BIO_.append(f"I-{cls_type}")
TAG_TO_IDX = {s: i for i, s in enumerate(TAG_TO_IDX_)}
TAG_TO_IDX_BIO = {s: i for i, s in enumerate(TAG_TO_IDX_BIO_)}
And one more question.
I have to use entities for training SORIE's entities as following
{ "company": "BOOK TA .K (TAMAN DAYA) SDN BHD", "date": "25/12/2018", "address": "NO.53 55,57 & 59, JALAN SAGU 18, TAMAN DAYA, 81100 JOHOR BAHRU, JOHOR.", "total": "9.00" }**or just can I use only box and scripts file without entities **
1,83,41,331,41,331,78,83,78,TAN WOON YANN,other 1,109,171,330,171,330,191,109,191,MR D.I.Y. (M) SDN BHD,company 1,122,190,325,190,325,213,122,213,(CO. RFG : 860671-D),other 1,47,208,391,208,391,233,47,233,LOT 1851-A & 1851-B, JALAN KPB 6,,address 1,62,235,381,235,381,254,62,254,KAWASAN PERINDUSTRIAN BALAKONG,,address 1,70,256,384,256,384,275,70,275,43300 SERI KEMBANGAN, SELANGOR,address 1,125,275,318,275,318,297,125,297,(TESCO PUTRA NILAI),other 1,177,295,266,295,266,317,177,317,-INVOICE-,other 1,12,337,402,337,402,362,12,362,KILAT AUTO ECO WASH & SHINE ES1000 1L,other 1,20,360,160,360,160,383,20,383,WA45 /2A - 12,other 1,16,382,156,382,156,402,16,402,9555916500133,other
For the training phase, only the latter one is required. The codes directly parse the annotations and generate the corresponding BIO tags.
I will try. Thank you very much for your support and effort. Have nice days.