[QUESTION] Assigning weights to sentences in depparse training
Question about training stanza depparse (on tokenized and pretagged document): is there a way to assign weights to input sentences, so that some sentences will appear more important than others? A naive approach would be to duplicate sentences, but I wonder if there is another way to do it?
It would be possible to add that feature. The loss gets calculated by iterating over the sentences, using the model to calculate the loss when the gold values are available:
https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/parser.py#L209 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/trainer.py#L53
The model calculates individual pieces of the loss here:
https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L190 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L196 https://github.com/stanfordnlp/stanza/blob/a85cce6816f40fa03ab06f6497c7d65ba1244a33/stanza/models/depparse/model.py#L209
You could weight it by whatever you like, as long as you pass in the sentence weights to model.forward(). I would say we're not going to add that feature any time soon, but if you're inspired to make a PR which adds that feature, we can talk about what it would look like
Thank you for the quick and through reply! Unfortunately, adding this feature is out of my scope for the time being, so I am closing this issue now.
That's fair. We can leave this open, and hopefully one day one of us (or you) can add it as a feature.