stanza icon indicating copy to clipboard operation
stanza copied to clipboard

[QUESTION] Assigning weights to sentences in depparse training

Open soras opened this issue 3 years ago • 3 comments

Question about training stanza depparse (on tokenized and pretagged document): is there a way to assign weights to input sentences, so that some sentences will appear more important than others? A naive approach would be to duplicate sentences, but I wonder if there is another way to do it?

soras avatar Mar 14 '23 14:03 soras

It would be possible to add that feature. The loss gets calculated by iterating over the sentences, using the model to calculate the loss when the gold values are available:

https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/parser.py#L209 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/trainer.py#L53

The model calculates individual pieces of the loss here:

https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L190 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L196 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L209

You could weight it by whatever you like, as long as you pass in the sentence weights to model.forward(). I would say we're not going to add that feature any time soon, but if you're inspired to make a PR which adds that feature, we can talk about what it would look like

AngledLuffa avatar Mar 14 '23 15:03 AngledLuffa

Thank you for the quick and through reply! Unfortunately, adding this feature is out of my scope for the time being, so I am closing this issue now.

soras avatar Mar 15 '23 12:03 soras

That's fair. We can leave this open, and hopefully one day one of us (or you) can add it as a feature.

AngledLuffa avatar Mar 16 '23 00:03 AngledLuffa