dependabot-core icon indicating copy to clipboard operation
dependabot-core copied to clipboard

Scarcity on dependabot compatibility scores

Open hrz6976 opened this issue 3 years ago • 4 comments

Hi There :) Don't know if it is the right place to start a discussion but had no luck with GitHub support

I'm an open-source software researcher working on analyzing dependabot PRs. Nearly all (~99%) PRs in our dataset had their compatibility score labelled as "unknown", which is a little bit counter-intuitive to me. Considering npm package updates is the most common case in our dependabot PR dataset (120503/186697, 64.5%), I did a quick validation on npm packages:

  • sample 20 most depended-upon npm packages (lodash, react-dom, vue, axios etc.) data source

  • fetch all major/minor/patch versions released after 2020/01/01 from the npm registry

  • fetch compatibility score for each version pair (i.e package P may update from version V1 to version V2)

Here's the code snippet used to extract compatibility score from svg (full code and data here):

from xml.dom import minidom
def get_dependabot_compatibility_score_ver(package_manager, dependency_name, oldver, newver) -> str:
    url = f"https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name={dependency_name}&package-manager={package_manager}&previous-version={oldver}&new-version={newver}"
    res = requests.get(url)
    svg_text = res.text
    doc = minidom.parseString(svg_text)
    for ele in doc.getElementsByTagName("text"):
        res = ele.firstChild.nodeValue
        if res != "compatibility":
            return res

The results showed a sparse data distribution. 1604/1629 (98.5%) version pairs are labelled as "unknown". And here come the questions:

  • Is it something with our methodology or scarcity on compatibility scores confirmed?

  • How is the score calculated? Is dependabot filtering out certain projects or leaving the score "unknown" until a certain number of projects merged the update PR?

  • What is the possible cause of this sparse data distribution? (I'm not quite familiar with testing, so no idea about this)

Any idea about this is welcomed!

hrz6976 avatar Nov 13 '21 14:11 hrz6976

Hi @12f23eddde, this is something we've noticed internally recently and are tracking. I don't have anything to share in terms of improving it at the moment, but I'm fairly certain it is a problem on our side and not your methodology.

brrygrdn avatar Nov 26 '21 13:11 brrygrdn

Same issue here, and if you click on the icon/graphic, it no longer load the page that shows you the previous scores. image

tonydehnke avatar Jan 17 '22 08:01 tonydehnke

Dependabot requires at least 5 candidate updates for a valid compatibility score badge to be shown (at least that was the case back in the summer of 2021 - see https://github.com/dependabot/dependabot-core/issues/4001/#issuecomment-870399478).

It also used to explain on Dependabot's site (which has since been taken down since being acquired by GitHub) that Dependabot will only include results from PRs for dependency updates that have a CI pipeline configured (e.g., GitHub Actions or TravisCI). Also, the PR didn't necessarily have to be merged in order for it to count towards the compatibility score (e.g, if a PR for a dependency update failed the client's CI pipeline, and the client decided to close the PR without merging, that dependency update would still count towards the compatibility score).

Of course, this is only what it used to say on the dependabot.com website. Things might have changed since then.

brombaut avatar Mar 08 '22 16:03 brombaut

I don't think this feature is functioning properly as of late. Never seen anything other than unknown. This was useful, now just broken?

tonglil avatar Sep 20 '22 01:09 tonglil

This is still on our radar. Many thanks to @Nishnha who recently tweaked the DB query used to calculate the compatibility score. Our metrics showed a noticeable reduction in the number Unknown.

However, the metrics also show that we still return Unknown more often than we'd like, so we're still tracking improving it further when we have more time down the road.

jeffwidman avatar Nov 15 '22 17:11 jeffwidman

@Nishnha made some further improvements here after @malcolmtaylor noticed the database wasn't using an index like we expected. Since then, we've seen a massive reduction in the number of query timeouts for these badges. 🎉

That won't solve all cases of missing / unknown badges because as noted above we do require a minimum number of candidate PR's before we show the badge (I think it's 5 but haven't doublechecked the code)... but it should help with a bunch of them.

I'm going to close this for now, but if you happen to notice a PR bumping a popular lib where you expected we'd have enough candidate PR's to generate a badge but still see unknown, please feel free to file an issue and we can doublecheck what's going on.

jeffwidman avatar Jan 03 '23 21:01 jeffwidman

I feel there's still an issue here 🤔 In our case we're using Dependabot across all of our Ruby repositories, and we're still getting "Unknown" compatibility scores in pretty much all of our Dependency upgrades (with rare and sparse exceptions). For example:

Screenshot 2023-11-29 at 11 31 56

Screenshot 2023-11-29 at 11 32 10

Drowze avatar Nov 29 '23 14:11 Drowze