Lam Chi

Results 14 issues of Lam Chi

请问是否有 pytorch版的Chinese bert-large (24-layer, 1024-hidden, 16-heads) pretrained model?

If not, how to add the src-side and tgt-side attention prob distributions together?

If Adam optimizer is used, can it still work ? (Line 7. indicates a standard gradient desent method) Or, this just fit into the SGD based optimizer? 2. I would...

文章提到L2RW "might lead to unstable weighting behavior during training and unavailability for generalization" 请问这是为什么?依据是什么?

请问你论文中采用的onto5的ner数据集是用的所有18个实体类别的吗?还是说只考虑了主要的几大类 (e.g., LOC, PER, ORG, MISC ...) 另一个是,数据集的划分是按 #train: 94292,#dev:13900 , #test:10348 吗? 谢谢!

`mlen = mems[0].size(0) if mems is not None else 0 klen = mlen + qlen if self.same_length: all_ones = word_emb.new_ones(qlen, klen) mask_len = klen - self.mem_len if mask_len > 0:...

https://github.com/VIPL-Audio-Visual-Speech-Understanding/LipNet-PyTorch/blob/40209e09c49553c00c25c7d41faa3706aea3c625/scripts/extract_lip.py#L91 Why set the standard face by this way (including the parameters of affine transformation)? Could you give more specific insights?

As mentioned in S3, the pre-trained models are always trained on the same data as the full model (yet I do not know the pre-training details), and specially the pre-trained...

你好,请问论文中所说的step-wise patch embedding的实现具体体现在哪里呢?是通过不同stage设置不同patch size的patch embedding来体现的吗?