Haozhe Ji comments

Results 7 comments of


                                            Haozhe Ji

您好，请问一个不是该模型的问题，也是您曾经在goole-bert提问过的attention mask的问题。

和bert不一样，gpt因为是解码器所以attention mask是下三角矩阵而不是全1的。对于序列最后的padding因为不会在对应输出端施加loss因此不会影响前面有意义的token。

An error occurred while running the warm_start.sh file

Can you print the complete error log? I don't seem to have this error message

a deep reinforced model for abstractive summarization

It seems that the author didn't release the code of this work.

add incremental linear-chain CRF

Oh, I just accidentally commit it to the original main branch. If you found it interesting, maybe I can reopen it?

add incremental linear-chain CRF

I use forward-backward algorithm to calculate the marginal of linear-chain CRF of the prefix sequence (in order to support AR models). The parallel calculation can be done in O(logN) complexity...

add incremental linear-chain CRF

You mean the speed up comparing to using the gradient identity? At first I have tried only calculating the prefix sum and using back-propagation to get the marginal of the...

Fail to build on MacOS

I reinstall clatexmath, and the full log is here: (still with the same error it seems) ``` ==> Downloading https://formulae.brew.sh/api/formula.json ######################################################################## 100.0% ==> Fetching dependencies for sp1ritcs/tap/notekit: sp1ritCS/tap/clatexmath and zlib...