wlhgtc

Results 27 comments of wlhgtc

> Hi @wlhgtc, as mentioned above, the Chinese version of layoutlmv3 uses XLMRobertaTokenizer, it is the difference between layoutlmv3-zh and layoutlmv3-en. [This code](https://github.com/microsoft/unilm/blob/44273d47ac0971cd0ebe05335eb1e0043883b898/layoutlmft/examples/run_funsd.py#L178) provides an example of how to process...

Thanks for your reply, I will try it!

By the way, I denote some part about tensorboard(writer)

So glad to see your reply, and list some person understanding, could help me correct them? 1. About the shift operation in 2., seems an easy way to calculate position...

By the way ,seems like you add position embedding at each layer, is there any improvement compared with add only with the word embedding in your ablation study? @zihangdai

Seem your position embedding conflict with the original version in ??? your layer seems like the second col(sin,sin,...,sin,cos,cos,...,cos); but it should like the first col(sin,cos,sin,cos,...). ![image](https://user-images.githubusercontent.com/16603773/51321460-ccd1cd00-1a9d-11e9-9b50-185dd7a58f00.png)

And thanks for your help, I finish detach the whole TRANSFORMER-XL model code in a single file. Still one question about your training process: https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/train.py#L433-L437 Seem you split the whole...

Yeach, but I mean when we training on data[i], we need mems[i-1]: the memory for the last chunk. But ` ret = para_model(data_i, target_i, *mems[i])` seem use mems[i]?

Fine, I re-read the code, seem the `mems` update when a batch finish and will be used in the next batch, am I right? But according to the Figure 2...