augur icon indicating copy to clipboard operation
augur copied to clipboard

BUG: tool_source meta data in message table incorrect for issue messages

Open cdolfi opened this issue 1 year ago • 2 comments

Issue comments are being tagged as PR comments. Example on the augur repository:

also the result of SELECT DISTINCT tool_source FROM message m WHERE repo_id = 36113 (36113 is https://github.com/chaoss/augur in our db) is:

Pr comment task and Pr review comment task

cdolfi avatar Oct 10 '23 14:10 cdolfi

Message data is mapped correctly, it is just a meta data issue

cdolfi avatar Oct 10 '23 14:10 cdolfi

@cdolfi : I dug into this and wanted to ask your preference for resolution. Because the GitHub API for messages is essentially the same for issues and pr's we have a message task that gets both attached to the pr task. This reduces our API calls significantly than if we did them separately because the issues API gives us all the messages for issues and PRs .. For whatever reason, this same logic does not apply to the GitHub api for messages connected to a pull request review.

This is the code:

adef process_messages(messages, task_name, repo_id, logger, augur_db):

    tool_source = "Pr comment task"
    tool_version = "2.0"
    data_source = "Github API"

I am going to do a test with this subsequent code that searches for a matching PR or Issue, and see if I can update "tool_source" in a meaningful way from there.

        if is_issue_message(message["html_url"]):

            try:
                issue_id = issue_url_to_id_map[message["issue_url"]]
                related_pr_or_issue_found = True
            except KeyError:
                logger.info(f"{task_name}: Could not find related issue")
                logger.info(f"{task_name}: We were searching for: {message['id']}")
                logger.info(f"{task_name}: Skipping")
                continue

            issue_message_ref_data = extract_needed_issue_message_ref_data(message, issue_id, repo_id, tool_source, tool_version, data_source)

            message_ref_mapping_data[message["id"]] = {
                "msg_ref_data": issue_message_ref_data,
                "is_issue": True
            }

        else:

            try:
                pull_request_id = pr_issue_url_to_id_map[message["issue_url"]]
                related_pr_or_issue_found = True
            except KeyError:
                logger.info(f"{task_name}: Could not find related pr")
                logger.info(f"{task_name}: We were searching for: {message['issue_url']}")
                logger.info(f"{task_name}: Skipping")
                continue

            pr_message_ref_data = extract_needed_pr_message_ref_data(message, pull_request_id, repo_id, tool_source, tool_version, data_source)


            message_ref_mapping_data[message["id"]] = {
                "msg_ref_data": pr_message_ref_data,
                "is_issue": False
            }

I suspect that because the tool source is set in the header, this will not actually work, but instead create a bigger problem, where basically "the last metadata wins" for each repository batch ... but I'm gonna check.

sgoggins avatar Nov 03 '23 21:11 sgoggins