paulpaul91 comments

Results 11 comments of


                                            paulpaul91

Inconsistent variable name "word2index" in pre-processing notebook

hey，friend, I meet the same problem. Did you have solve it? Thank you

DocVQA reproduce problem using StructuralLM

> Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the StructuralLM model,...

DocVQA reproduce problem using StructuralLM

> > > Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the...

DocVQA reproduce problem using StructuralLM

> > > Thanks for the attention. Actually, we used some optimization techniques the same as layoutlmv2. You can refer to the paper. At the same time, based on the...

DocVQA reproduce problem using StructuralLM

> The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train set and...

DocVQA reproduce problem using StructuralLM

> > > The continue pre-training on the DocVQA set can bring about 2.0+ ANLS. train set and validation set. QG can bring about 2.4+ANLS. In addition, merge the train...

[Question]: DocVQA-ZH数据集的preprocess问题

> ### 中文字符token vs 句子分词token > 请问，DocVQA-ZH的数据集的预处理 model_zoo/ernie-layout/utils.py/Precessor.py/preprocess_mrc中, DocVQA-ZH数据集的text是单个中文字符（非句子在做分词），并且上面提到的阅读理解的预处理preprocess_mrc也没有先合并句子在做分词，我想知道为什么？我看到其他例子，比如说application/下的智能文档对ocr结果就是先分行在拆分字符，在合并成完成句子，最后做分词。这两种处理方式（句子做分词vs直接用字符作为token）效果是一样的吗？为什么不用词的token而用字符token 效果是一样的哈，不用担心

paulpaul91

Inconsistent variable name "word2index" in pre-processing notebook

DocVQA reproduce problem using StructuralLM

DocVQA reproduce problem using StructuralLM

DocVQA reproduce problem using StructuralLM

DocVQA reproduce problem using StructuralLM

DocVQA reproduce problem using StructuralLM

[Question]: DocVQA-ZH数据集的preprocess问题

OOV solution

Can not reproduce the effect of the paper

where is the rvl-cdip dataset