github-readme-stats icon indicating copy to clipboard operation
github-readme-stats copied to clipboard

Top Languages card results are incorrect

Open JmKanmo opened this issue 2 years ago • 9 comments

I write it in my read.me file like below

Top Langs

but I don't use python language code ... My major used language is java.

So I test it other below site.
http://ionicabizau.github.io/github-profile-languages/?user=JmKanmo

and the result is like it. bb

Why is this?

JmKanmo avatar Jun 03 '22 09:06 JmKanmo

Having a similar issue where it seems more like mine just hasn't updated in weeks

MSBivens avatar Jun 05 '22 00:06 MSBivens

same here

Zordon1337 avatar Jun 07 '22 14:06 Zordon1337

Same

nicolasfara avatar Jun 09 '22 07:06 nicolasfara

Same :( my contribution to private repositories is not being accounted for, even though I added the param to show them.

amandie-ct avatar Jul 05 '22 04:07 amandie-ct

@JmKanmo Thanks for your issue. We are aware of the inaccuracy of the language card. Limitations of the current GraphQL implementation cause it (see #1803 and https://github.com/anuraghazra/github-readme-stats/pull/1122#issuecomment-1152066225 for more information).

Currently, the GraphQL API does not allow us to fetch language results for individual users. It only returns language results for repositories. As a result, the language card is not showing the correct statistics. I created a feature request with GitHub that improves this behaviour. You can show your support at https://github.com/github-community/community/discussions/18230. If enough people show their support, we might be able to improve the language results of github-readme-stats in the future. Additionally, we currently only fetch the first 100 repositories causing the language card to be incorrect (see https://github.com/anuraghazra/github-readme-stats/issues/1852).

I also checked https://ionicabizau.github.io/github-profile-languages to see if it produced better results. That repository used the Github Rest API to fetch all repositories of a user to get the language results. Consequently, since this also includes forks, the results given by that tool will be worse for most user accounts.

You can also show your support for #1732, which slightly improves the language card behaviour by allowing users to scale their language results.

rickstaa avatar Jul 06 '22 12:07 rickstaa

To summarize the following things can be done to improve the language card:

  • Fetch all repositories instead of only the first 100 (see #1852).
  • Giving users the ability to modify the language card calculation (see #1600).

rickstaa avatar Jul 07 '22 09:07 rickstaa

Hey @JmKanmo

Are you using the default language weight?

The top languages card shows the percentage based on the size of the repositories by default.

Also: By default, the language card shows language results only from public repositories. To include languages used in private repositories, you should deploy your own instance using your own GitHub API token.

Algorithm

It uses the following algorithm to calculate the languages percentages on the language card:

ranking_index = (byte_count ^ size_weight) * (repo_count ^ count_weight)

By default, only the byte count is used for determining the languages percentages shown on the language card (i.e. size_weight=1 and count_weight=0). You can, however, use the &size_weight= and &count_weight= options to weight the language usage calculation. The values must be positive real numbers. More details about the algorithm can be found here.

  • &size_weight=1&count_weight=0 - (default) Orders by byte count.
  • &size_weight=0.5&count_weight=0.5 - (recommended) Uses both byte and repo count for ranking
  • &size_weight=0&count_weight=1 - Orders by repo count

In my case, I am using &size_weight=0&count_weight=1 because I usually code in Python, but I have one repository with huge Google Collab notebook that uses 60% of my card if I use the default settings.

fsantamaria1 avatar Aug 29 '23 14:08 fsantamaria1

Thank you all for your kind replies. I checked the answer and then updated README.MD like below

![Top Langs](https://github-readme-stats.vercel.app/api/top-langs/?username=JmKanmo&size_weight=0.5&count_weight=0.5&layout=compact&theme=dark)](https://github.com/JmKanmo/JmKanmo) </br> </br>

I have python repositories like below.
https://github.com/JmKanmo/UserManagerWebsite https://github.com/JmKanmo/PetService_Web https://github.com/JmKanmo/PythonBasicStudy

If we were to calculate percentage statistics based on byte count, perhaps more code would be counted in the Python repository. At this level, I don't think it's that bad. Thank you for your detailed answer and guide.

ds

JmKanmo avatar Aug 30 '23 17:08 JmKanmo

@JMKanmo, there are still some issues with the language algorithm that require attention:

  1. Currently, the algorithm only considers the first 100 repositories of a user, as documented in this GitHub issue (https://github.com/anuraghazra/github-readme-stats/issues/1852).
  2. Forks and organization repositories are not included in the language calculation, as reported in these two issues (https://github.com/anuraghazra/github-readme-stats/issues/1 and https://github.com/anuraghazra/github-readme-stats/issues/3109).
  3. The algorithm relies on the languages found in a repository rather than the languages a user has actively used, which is discussed in this issue (https://github.com/anuraghazra/github-readme-stats/issues/1801#issuecomment-1176153879).

Addressing the first three points could be accomplished by releasing a GitHub Action, as suggested in this issue (https://github.com/anuraghazra/github-readme-stats/issues/2179). However, the last point requires intervention from GitHub itself, and you can just express your support for this improvement in the GitHub Community Discussions at (https://github.com/orgs/community/discussions/18230). Let's keep this issue open to monitor progress on resolving these problems. 🚀

rickstaa avatar Sep 11 '23 20:09 rickstaa