BiaffineDParser icon indicating copy to clipboard operation
BiaffineDParser copied to clipboard

Function parameters are in reverse order in biaffine class

Open TimeLessLing opened this issue 5 years ago • 5 comments

in the file Model.py, the forward function of class ParserModel, arc_logit = self.arc_biaffine(x_arc_dep, x_arc_head) and in the file Layer.py, the forward function of class Biaffine, biaffine = torch.transpose(torch.bmm(affine, input2), 1, 2) which means the final result is affine * input2 and the affine is calculated by input1, which is x_arc_dep, input2 is x_arc_head, but in the original paper, the formulation is s^(arc) = H^(arc_head) * U^(1)*H^(arc_dep) + H^(arc_head)*u^(2) it seems that the order of H(head) and H(dep) are in reverse in the code.

TimeLessLing avatar Nov 02 '19 03:11 TimeLessLing

Thanks for your pointing out. However, that is no matter as they are symmetrical.

zhangmeishan avatar Nov 02 '19 03:11 zhangmeishan

Thanks for your pointing out. However, that is no matter as they are symmetrical.

OK, and I also have a question about that formulation. The original formulation has two weight matrix U1 and U2, but in the code, I found you seemed combined U1 and U2 by adding a whole-one vevtor ones?

TimeLessLing avatar Nov 02 '19 03:11 TimeLessLing

You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.

zhangmeishan avatar Nov 02 '19 03:11 zhangmeishan

You can check the differences carefully to examine whether they are equivalent. Whatever, implementation details may be a little different, while the performance is the key. Do not sink into the trivials.

Thank you for your advance. I already know how these code correspond to the formulas. But now I have another problem,I find the sofrmax2d function in MST.py contains a process of subtracting the maximum value. y -= np.max(y, axis=1, keepdims=True) I don't know why should subtracting the maximum value. What is the special meaning of the maximum value?

TimeLessLing avatar Nov 25 '19 02:11 TimeLessLing

Preventing value overflow. Assuming that [100000, 1000002], how to compute softmax in practice?

zhangmeishan avatar Nov 25 '19 04:11 zhangmeishan