李金梁 comments

Results 13 comments of


                                            李金梁

Bert feature 里面为什么会有 LSTM 结构

他可能写错了，model里少写了bert层，一般情况下是bert+lstm一起用的

请问为什么不使用pooler-out进行分类？

相差不大，CLS是从bert传承下来的，你可以自己试一下，还可以使用最后好多层的结果pooling做分类

README 第2.2节错别字： threadIdx：thread在所在block中的为欧洲索引为欧洲索引是不是本来想写**位置索引** 勘误：这两个公式是不是写错了？ ` 那么，任意一个thread在block中的编号为threadIdx.y * blockDim.x + blockDim.x。`和 `> 同理，任意一个block在grid中的编号为blockIdx.y * gridDim.x + gridDim.x。` ，这两个公式相加的项是不是分别想写 thread.x 和 block.x 。最后总结的公式倒是没问题（苦笑） > > 那么，任意一个thread在block中的编号为threadIdx.y *...

【TYPO】

时间太久了，不记的了 :(

【TYPO】

小问题吧，跑一遍代码，啥都有了

Quantization

同学可以先看一下高通的量化白皮书。 [arXiv:2106.08295v1](https://arxiv.org/pdf/2106.08295)

loss nan

yes, i met this problem, too. Maybe it's the original learing rate is too big, and the model can't find the right weight space. More epochs will solve this problem.

Just to show my gratitude

thx

ERROR: Failed building wheel for transformer-engine

I fixed this bugs by add `export PATH=/usr/local/cuda/bin:$PATH` to .bashrc . That cost me one afternoon.