unilm icon indicating copy to clipboard operation
unilm copied to clipboard

In XFUND dataset why B-QUESTION", "B-ANSWER", "B-HEADER", "I-ANSWER", "I-QUESTION", "I-HEADER

Open ChidanandKumarVimaan opened this issue 2 years ago • 3 comments

Describe Model I am using (UniLM, MiniLM, LayoutLM ...):

In XFUND dataset, there are only 4 classes QUESTION, ANSWER, HEADER,OTHER but in
https://github.com/microsoft/unilm/blob/42100e11bdd3ac8e9ca2e9b506af8c9231a0c6d6/layoutlmft/layoutlmft/data/datasets/xfun.py#L48

there are 7 classes.

Not able to understand 7 classes instead of 4 classes. KIndly help

ChidanandKumarVimaan avatar Nov 23 '22 12:11 ChidanandKumarVimaan

@Dod-o o you have any answer to the above question? kindly let me know

ChidanandKumarVimaan avatar Nov 24 '22 14:11 ChidanandKumarVimaan

@ChidanandKumarVimaan , This is 'BIO' tagging scheme (for token classification or NER task) , So each tag has "Begin" , "Inside", "Other" , So in total 7 classes.

abhibisht89 avatar Nov 25 '22 15:11 abhibisht89

@abhibisht89 Thanks got it

ChidanandKumarKS avatar Nov 26 '22 04:11 ChidanandKumarKS