Zheyu Ye

Results 8 issues of Zheyu Ye

It seems that there exist a conflict in the `num_attention_heads` which is set as 32 in the `albert_config.json` included in the model tar file downloaded from [TF Hub](https://tfhub.dev/google/albert_xlarge/2) instead of...

我注意到这两个tiny版本略有不同, 从config里看到模型size保持了一致, 但是模型文件参数`ckpt.data`大小有差异(17.2MB vs 49.7MB) 想请教一下brightmart版本是在预训练的具体什么阶段做了哪些优化

There are a lot of irrelevant or unsed parameters in the configuration file such as `pooler_fc_size`, `pooler_num_attention_heads`, `pooler_size_per_head`, `pooler_type`. It's even more remarkable that `type_vocab_size` is set to be `2`...

As title descripted, the following url link failed (https://github.com/dmlc/gluon-nlp/blob/master/README.md#14) ``` ``` And the correct one would be ``` https://github.com/dmlc/gluon-nlp/actions ```

bug

## Description ## Add character-level tokenizer with test. ## Comments ## Although this is a work-in-process PR, it is still open to review and please leave comments on code quality...

I am woudering if is possible that adding nezha chinese model into huggingface transformers repo, for both code and paramaters. nazhe is widely used in chinese nlp community, and would...

使用`bert4keras==0.11.3`加载模块时报错`RecursionError: maximum recursion depth exceeded while calling a Python object`, 像是tensorflow以及keras多个backend互相调用导致的 报错概要: ``` Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.6/dist-packages/bert4keras/tokenizers.py", line 5, in from bert4keras.snippets...

I used the same hyper-parameters as the paper but generator size 1:1 with the hidden size of 256 as you claimed in #39 to pretrain a electra small model on...