Zhilin Yang comments

Results 15 comments of


Zhilin Yang

Added Colab TPU support with Colab Notebook and modified repo

@aditya-malte Thanks for your contribution. It would be nice if you could do the following: - merge your changes with the original `configure_tpu` function to support all the cases; -...

Error while running the pretrained model on MNLI

You need to set `init_checkpoint` to be `model/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt` and `model_dir` to be a new separate folder.

Error while running the pretrained model on MNLI

You can't do eval without training because there are task-specific parameters (the output layer).

Error while running the pretrained model on MNLI

Well I think it's possible but does not make too much sense.

XLNet stuck for Text Classification task

Does it work if reduce the batch size, sequence length, or whatever reduces memory usage?

Error while using use_bfloat16 in run_classifier.py

Afaik, bfloat16 should be used on TPUs.

Multilingual or Chinese version plan?

Thanks for your interest. This is under our consideration.

why not use a partial factorization ?

Good question. In fact we are using the implementation that you just mentioned. Sorry about the confusion.

why not use a partial factorization ?

Yes, tokens 123678 have bidirectional attention and they attend to all the other tokens, while tokens 4 and 5 use an auto-regressive factorization conditioned on 123678. This is what we...

[Question]: Like bert xlnet also has a max_len of 512 tokens, what would be good way to process longer text

There is no memory overhead, because during inference there is no permutation. In fact, due to the use of relative positional encodings, you can increase `seqlen` to be larger than...