bugbug icon indicating copy to clipboard operation
bugbug copied to clipboard

Figure out how to handle additional comments in the component model

Open marco-c opened this issue 6 years ago • 3 comments

At the moment, the model rolls back to the time the bug was filed. In some bugs though, more information is available (e.g. the reporter might add more comments, someone could ask the reporter for some info, etc.), so it might be useful to use this additional information too. The additional comments could be considered as missing data when a bug has a single comment.

marco-c avatar Mar 21 '19 15:03 marco-c

@marco-c i could think of two options:

  1. we pass a condition and according to that we change these lines in bug_snapshot.py itself https://github.com/mozilla/bugbug/blob/e2810c24023be4d530f114118c597a93d0fafe47/bugbug/bug_snapshot.py#L840-L844

  2. we can add an intermediate step in the pipeline after the BugExtractor is called but before ColumnTransformer , where we change the "comments" column of the Dataframe.

chidauri avatar Aug 20 '19 13:08 chidauri

The problem we have to solve here is this:

  1. At training time, we have the bugs with their full history, so we can train on the first comment only or also train on all other comments;
  2. At operation time, we have the bugs with their history up to a certain point (the rest of the history is in the future). We can use the comments which have been made so far.

If we train on all comments, the model will perform badly in operation where we only have a few comments (or even just one).

When a person files a bug, they will insert the title and the first comment. Then some other comments could be inserted later. Our model should work well when only the title and the first comment are available, but it should also use additional comments to improve its results if additional comments are available.

Maybe we should perform some kind of data agumentation step so that we train on all bugs with only their first comment and all bugs with also additional comments.

marco-c avatar Sep 02 '19 22:09 marco-c

Another option could be to train two separate models, one with a single comment and one with additional comments too. At operation, we run the first model, if the confidence is high we apply the decision, if the confidence is low we wait for more comments and run the second model.

marco-c avatar Jul 09 '24 14:07 marco-c