grimoirelab-elk
grimoirelab-elk copied to clipboard
[Gitter] Misclasifying Pull Requests and Issues
Here is some data that is gitter enriched for a pull request:
{
"_index" : "gitter_enriched_raw",
"_type" : "items",
"_id" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
"_score" : 5.3138795,
"_source" : {
"metadata__updated_on" : "2016-04-30T00:34:26.399000+00:00",
"metadata__timestamp" : "2022-01-30T03:29:44.263053+00:00",
"offset" : null,
"origin" : "https://gitter.im/shuup/shuup",
"tag" : "https://gitter.im/shuup/shuup",
"uuid" : "a9a4a861b3011c2bafdef977c72b205419449b4d",
"unread" : 0,
"text_analyzed" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
"readBy" : 15,
"issues" : [
{
"repo" : "shoopio/shoop",
"number" : "441"
}
],
"id" : "5723fd92e10a59c061074eed",
"url_hostname" : [ ],
"tz" : 0,
"fromUser_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
"fromUser_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
"fromUser_name" : "Shawn Her Many Horses",
"fromUser_user_name" : "",
"fromUser_domain" : null,
"fromUser_gender" : "Unknown",
"fromUser_gender_acc" : 0,
"fromUser_org_name" : "Unknown",
"fromUser_bot" : false,
"fromUser_multi_org_names" : [
"Unknown"
],
"author_id" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
"author_uuid" : "83fe61561124b9a496e97c89c2f48d3ff8319eac",
"author_name" : "Shawn Her Many Horses",
"author_user_name" : "",
"author_domain" : null,
"author_gender" : "Unknown",
"author_gender_acc" : 0,
"author_org_name" : "Unknown",
"author_bot" : false,
"author_multi_org_names" : [
"Unknown"
],
"project" : "shuup/shuup",
"project_1" : "shuup/shuup",
"grimoire_creation_date" : "2016-04-30T00:34:26.399000+00:00",
"is_gitter_message" : 1,
"repository_labels" : [ ],
"metadata__filter_raw" : null,
"metadata__gelk_version" : "0.99.0",
"metadata__gelk_backend_name" : "GitterEnrich",
"metadata__enriched_on" : "2022-01-30T04:54:27.841150+00:00"
}
},
There should be is_pull
according to here: https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L166
Maybe the regex isn't working? https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L64
Here's a case with an issue:
{
"_index" : "gitter_enriched_raw",
"_type" : "items",
"_id" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
"_score" : 5.2737937,
"_source" : {
"metadata__updated_on" : "2017-01-27T08:45:38.335000+00:00",
"metadata__timestamp" : "2022-01-30T03:29:41.738953+00:00",
"offset" : null,
"origin" : "https://gitter.im/shuup/shuup",
"tag" : "https://gitter.im/shuup/shuup",
"uuid" : "963132bc57c2bf58a906aca2dc1f91fdeb65f76a",
"unread" : 0,
"text_analyzed" : "https://github.com/shuup/shuup/issues/361 -> i tried this",
"readBy" : 17,
"issues" : [
{
"repo" : "shuup/shuup",
"number" : "361"
}
],
"id" : "588b08b25309d6b3587415c3",
"url_hostname" : [ ],
"tz" : 8,
"fromUser_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
"fromUser_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
"fromUser_name" : "aoy12",
"fromUser_user_name" : "",
"fromUser_domain" : null,
"fromUser_gender" : "Unknown",
"fromUser_gender_acc" : 0,
"fromUser_org_name" : "Unknown",
"fromUser_bot" : false,
"fromUser_multi_org_names" : [
"Unknown"
],
"author_id" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
"author_uuid" : "dfd2be02ddb641b7634d4bf5b9aabf6527b6ecf4",
"author_name" : "aoy12",
"author_user_name" : "",
"author_domain" : null,
"author_gender" : "Unknown",
"author_gender_acc" : 0,
"author_org_name" : "Unknown",
"author_bot" : false,
"author_multi_org_names" : [
"Unknown"
],
"project" : "shuup/shuup",
"project_1" : "shuup/shuup",
"grimoire_creation_date" : "2017-01-27T08:45:38.335000+00:00",
"is_gitter_message" : 1,
"repository_labels" : [ ],
"metadata__filter_raw" : null,
"metadata__gelk_version" : "0.99.0",
"metadata__gelk_backend_name" : "GitterEnrich",
"metadata__enriched_on" : "2022-01-30T04:54:15.833892+00:00"
}
}
There should be an is_issue
key according to: https://github.com/chaoss/grimoirelab-elk/blob/b626512cd2768287bd1e52e98c7773073be21fc1/grimoire_elk/enriched/gitter.py#L163
It's probably a regex issue again?
Sometimes the pull request or issue is referred to in a span tag e.g.:
"data" : {
"id" : "5723fd92e10a59c061074eed",
"text" : "Looks like those issues should be fixed with this bugfix: https://github.com/shoopio/shoop/pull/441",
"html" : """Looks like those issues should be fixed with this bugfix: <span data-link-type="issue" data-issue="441" data-issue-repo="shoopio/shoop" class="issue">shoopio/shoop#441</span>""",
"sent" : "2016-04-30T00:34:26.399Z",
"unread" : false,
"readBy" : 15,
"urls" : [ ],
"mentions" : [ ],
"issues" : [
{
"repo" : "shoopio/shoop",
"number" : "441"
}
],
Even when it's a pull request, it's linked as an "issue" in the span tag.
So github will need to be queried to classify as either pull request or issue