Haozhe Ji
Haozhe Ji
和bert不一样,gpt因为是解码器所以attention mask是下三角矩阵而不是全1的。对于序列最后的padding因为不会在对应输出端施加loss因此不会影响前面有意义的token。
Can you print the complete error log? I don't seem to have this error message
It seems that the author didn't release the code of this work.
Oh, I just accidentally commit it to the original main branch. If you found it interesting, maybe I can reopen it?
I use forward-backward algorithm to calculate the marginal of linear-chain CRF of the prefix sequence (in order to support AR models). The parallel calculation can be done in O(logN) complexity...
You mean the speed up comparing to using the gradient identity? At first I have tried only calculating the prefix sum and using back-propagation to get the marginal of the...
I reinstall clatexmath, and the full log is here: (still with the same error it seems) ``` ==> Downloading https://formulae.brew.sh/api/formula.json ######################################################################## 100.0% ==> Fetching dependencies for sp1ritcs/tap/notekit: sp1ritCS/tap/clatexmath and zlib...