shields icon indicating copy to clipboard operation
shields copied to clipboard

GitHub Occurrences Badge

Open cloewen8 opened this issue 4 years ago • 9 comments

:clipboard: Description

A badge for GitHub that counts the occurrences of a sequence in a file.

For example, given /github/occurances/badges/shields/README.md/badge, the badge would show 22 (badge occurs in README.md 22 times).

Optimally, sequence should be a regular expression (escape sequences, word boundaries, character classes).

:link: Data

The data required for this can be retrieved from https://raw.githubusercontent.com/:user/:repo/:branch/:path. It only requires authentication for private repositories. Unfortunately I don't know of any official documentation for this endpoint, only that it is the destination when pressing "Raw" on a file on GitHub.

Additional processing is required and would need to be limited.

:microphone: Motivation

I personally want to use it to count the number of facts in a text file (each fact is on its own line). A badge for counting lines would work, but being able to count anything opens the door for a lot more opportunity:

  • How many times is "as you know" mentioned in a story?
  • How many times, if at all, is goto used?
  • How many code blocks are present?
  • How many references to a shutdown API exist?

cloewen8 avatar Sep 24 '19 05:09 cloewen8

Hi, thanks for your request. We have something similar which searches for files within a repo that match a specific pattern, using the GitHub Search API, however it’s not able to do this.

I like the idea of a dynamic text badge, and can see doing lines or string matching, however I feel like for a lot of things you’d want a regex (and not sure we should run arbitrary regexes, since they can be crafted in a way that they use a large amount of compute resource).

Can you share a link to the file? Sometimes seeing the specific case really solidifies why a feature should exist. It might also surface a creative way to use what is already there!

paulmelnikow avatar Sep 24 '19 12:09 paulmelnikow

I absolutely agree that arbitrary regex (or any user submitted code) should not be blindly trusted! For computation, a timeout can be used. I know regex101.com uses this strategy. image

Here is were I want to use it: https://github.com/cloewen8/dolphin-fact/blob/master/README.md Currently, I'm using /github/size, but this isn't very helpful, but better than nothing. I want to use it as a form of progress counter, get people interested as the project grows.

cloewen8 avatar Sep 24 '19 20:09 cloewen8

Huh, since it's a list, what would you think about using YAML instead? That way you could use the Dynamic YAML badge and a JSONPath expression.

All you'd have to do is prefix each line with a -, and then in your app, you could either strip off the leading - or a proper YAML parser (we use js-yaml which is great).

paulmelnikow avatar Sep 24 '19 22:09 paulmelnikow

YAML is definitely an option for my use case. I wouldn't consider it optimal over a simple text file or CSV file though. Would creating this or a similar badge be an option? Depending on what is required, I would be willing to just create it.

cloewen8 avatar Sep 25 '19 00:09 cloewen8

I'd be 👍 on adding a badge to count lines either in an arbitrary URL or in a file on GitHub. Would you be interested in working on that?

The GitHub version is a little more complicated because, to support auth, we use the Contents API.

This is the helper function that fetches file contents from GitHub repos. It parses JSON but could be refactored to obtain the contents as text.

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/github/github-common-fetch.js#L27-L60

Here's the existing GitHub badge for the package.json version, which is the closest badge we have to GitHub file line count.

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/github/github-package-json.service.js#L24-L79

The "any URL" version (which could be used with a raw.githubusercontent.com URL) would be simpler. The osslifecycle badge could be adapted pretty readily for this:

https://github.com/badges/shields/blob/90f8ffce9e8340c2444cfc473531887256ebe568/services/osslifecycle/osslifecycle.service.js#L1-L107

And here's our tutorial: https://github.com/badges/shields/blob/master/doc/TUTORIAL.md

paulmelnikow avatar Sep 25 '19 01:09 paulmelnikow

Confess I'm still not sure I'm following after reading through a couple times, but curious whether this is a case that would be better suited to our Endpoint Badge?

calebcartwright avatar Mar 10 '22 02:03 calebcartwright

I feel like this is general-purpose enough to be its own badge. That certainly is an option, but would require hosting the endpoint, which may be too much setup for users.

Would you be interested in working on that? Assuming this is still a wanted feature, I could definitely implement it now.

cloewen8 avatar Mar 10 '22 02:03 cloewen8

but would require hosting the endpoint, which may be too much setup for users.

This is a common and understandable intuition, but one which I tend to think is an incorrect assumption. With services like Runkit (linked in our Endpoint docs) there's 0 hosting concerns and 0 costs, users can quite literally just chuck a bit of code up there and be off and running.

While there's certainly a case to be made that your goal is something others might be interested in (though worth mentioning that we've not had any other requests nor has our community been upvoting/requesting this particular ask), I'm not convinced any implementation would actually be sufficiently general purpose. I also have some reservations about the notion of processing any arbitrary file on our prod servers as an implementation mechanism to achieve the goal.

I'm not inherently opposed to having this as a native badge, so I'd be happy to have my skepticism and concerns proven wrong if you're feeling sufficiently motivated to submit a PR! However, I do think the Endpoint is both the easiest and fastest approach, and is also something we could reference in places like Awesome Badges to highlight the pattern in case any future users are interested in something similar.

calebcartwright avatar Mar 10 '22 03:03 calebcartwright

I no longer need this, I forgot why I needed this to begin with. My motivation in implementing it is the simplicity and initial responses. If it would be more trouble than it's worth, I'd be happy to use the Endpoint badge with RunKit instead when needed (can't believe I ever missed this, great service).

cloewen8 avatar Mar 10 '22 03:03 cloewen8