BiaffineDParser
BiaffineDParser copied to clipboard
Function parameters are in reverse order in biaffine class
in the file Model.py, the forward function of class ParserModel,
arc_logit = self.arc_biaffine(x_arc_dep, x_arc_head)
and in the file Layer.py, the forward function of class Biaffine,
biaffine = torch.transpose(torch.bmm(affine, input2), 1, 2)
which means the final result is affine * input2
and the affine is calculated by input1, which is x_arc_dep, input2 is x_arc_head,
but in the original paper, the formulation is s^(arc) = H^(arc_head) * U^(1)*H^(arc_dep) + H^(arc_head)*u^(2)
it seems that the order of H(head) and H(dep) are in reverse in the code.
Thanks for your pointing out. However, that is no matter as they are symmetrical.
Thanks for your pointing out. However, that is no matter as they are symmetrical.
OK, and I also have a question about that formulation. The original formulation has two weight matrix U1 and U2, but in the code, I found you seemed combined U1 and U2 by adding a whole-one vevtor ones
?
You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.
You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.
Thank you for your advance. I already know how these code correspond to the formulas. But now I have another problem,I find the sofrmax2d function in MST.py contains a process of subtracting the maximum value.
y -= np.max(y, axis=1, keepdims=True)
I don't know why should subtracting the maximum value. What is the special meaning of the maximum value?
Preventing value overflow. Assuming that [100000, 1000002], how to compute softmax in practice?