Lam Chi issues

Results 14 issues of


                                            Lam Chi

请问有中文的google bert-large 预训练模型吗？

请问是否有 pytorch版的Chinese bert-large (24-layer, 1024-hidden, 16-heads) pretrained model？

Are the source-side and target-side vocabularies shared?

If not, how to add the src-side and tgt-side attention prob distributions together?

Two issues

If Adam optimizer is used, can it still work ? (Line 7. indicates a standard gradient desent method) Or, this just fit into the SGD based optimizer? 2. I would...

为什么要把一个batch里面的sent展平成一个长序列呢？（LSTMEncoder和SelfAttention最后输出都是这样处理）

请问本文的方法和L2RW的比较

文章提到L2RW "might lead to unstable weighting behavior during training and unavailability for generalization" 请问这是为什么？依据是什么？

请教关于论文中采用的英文Ontonotes5数据集一些问题

请问你论文中采用的onto5的ner数据集是用的所有18个实体类别的吗？还是说只考虑了主要的几大类 (e.g., LOC, PER, ORG, MISC ...) 另一个是，数据集的划分是按 #train: 94292，#dev：13900 ， #test：10348 吗？谢谢！

what if mems is None?

`mlen = mems[0].size(0) if mems is not None else 0 klen = mlen + qlen if self.same_length: all_ones = word_emb.new_ones(qlen, klen) mask_len = klen - self.mem_len if mask_len > 0:...

Confusion about the standard face to be aligned

https://github.com/VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch/blob/40209e09c49553c00c25c7d41faa3706aea3c625/scripts/extract_lip.py#L91 Why set the standard face by this way (including the parameters of affine transformation)? Could you give more specific insights?

pre-trained VSR / ASR model

As mentioned in S3, the pre-trained models are always trained on the same data as the full model (yet I do not know the pre-training details), and specially the pre-trained...

step-wise patch embedding的实现

你好，请问论文中所说的step-wise patch embedding的实现具体体现在哪里呢？是通过不同stage设置不同patch size的patch embedding来体现的吗？