github-readme-stats icon indicating copy to clipboard operation
github-readme-stats copied to clipboard

New top language algorithm implementation

Open kitswas opened this issue 3 years ago • 11 comments

Old Card

Old card

New Card

New card

The tests have been updated.

I am not very familiar with Javascript so I made only minimal changes.
Some refactoring might be required.

Implements #1600

kitswas avatar May 03 '22 10:05 kitswas

@kitswas is attempting to deploy a commit to the github readme stats Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar May 03 '22 10:05 vercel[bot]

There was an error in my earlier implementation.

See the differences below:

  • Has error - Error
  • Current - Current

Preview Deployment URL - https://github-readme-stats-l34h9pi63-kitswas.vercel.app

kitswas avatar May 05 '22 01:05 kitswas

@rickstaa whats your thoughts on this?

anuraghazra avatar May 05 '22 14:05 anuraghazra

@anuraghazra, I agree with @kitswas that the language counting algorithm could be improved. Due to some pressing deadlines, I did not yet have time to go deep into the proposed calculation. I will therefore give you my first thoughts.

From my first impression, the y variable (i.e. the number of repositories) influences the percentages. By running an optimisation on the p and q factors, we might get better results, but I'm afraid that judging the accuracy of the result might be very subjective. Currently, I am inclined to go with the old behaviour since it is similar to what GitHub uses in their repo language overview (i.e. https://github.com/github/linguist/). I, however, need to dive in deeper to give a detailed conclusion.

rickstaa avatar May 07 '22 08:05 rickstaa

From my first impression, the y variable (i.e. the number of repositories) influences the percentages. By running an optimisation on the p and q factors, we might get better results, but I'm afraid that judging the accuracy of the result might be very subjective. Currently, I am inclined to go with the old behaviour since it is similar to what GitHub uses in their repo language overview (i.e. https://github.com/github/linguist/). I, however, need to dive in deeper to give a detailed conclusion.

The influence of y is felt only when it is greater than 1, i.e. when a language appears in multiple repositories. The numbers p and q could be optimised. Allowing the users to specify these might be a possibility. The lower the exponents, the more stable the percentages are.

kitswas avatar May 07 '22 09:05 kitswas

Without diving deeper into the formula. If we decide to implement this option the following would make sense to me:

  • Rewrite the k = x^p * y^q in a way that allows us to disable the new behaviour by default.
  • By default disabling the influence of y.
  • Giving users the ability to change the p and q values.
  • Documenting this new behaviour in the README.

rickstaa avatar May 07 '22 10:05 rickstaa

Please be aware that I did not do any research on metrics that could give a better 'used languages' representation. I, therefore, can only comment on the metric you proposed in #1732 and #1600. It could be that there are better statistical indexes for solving the problem. Nevertheless, I don't think we need to overthink this since your metric does what it is supposed to do.

rickstaa avatar May 07 '22 10:05 rickstaa

The new commits -

  • [x] Rewrite the program in a way that disables the new behaviour by default.
  • [x] By default disable the influence of y.
  • [x] Give users the ability to change the p and q values
  • [x] Document this new behaviour in the README.

The preview deployment URL is https://github-readme-stats-7vlsc5n0x-kitswas.vercel.app

This will prevent users from seeing their lang-cards suddenly change percentages if and when this is deployed.

kitswas avatar May 07 '22 12:05 kitswas

@kitswas, thanks a lot for creating the new commits. It shows you are devoted to improving this project! I think people like you make a strong community found behind this project! 🚀

I, however, have to say I'm still not 100% on board with this. When I created https://github.com/anuraghazra/github-readme-stats/pull/1732#issuecomment-1120180189 I overlooked the case where p is 0 and q is 1. I don't think this version makes sense with the conception people might have with the language card. Namely, that is similar to the one shown by GitHub. I, however, understand that I'm also biased since I'm happy with the old behaviour and, therefore, likely would not use #1732 myself. I changed #1600 to a proposal so we can get more feedback from the community.

@anuraghazra, what do you think? Another way to implement this would be to create a switching variable rank_method, allowing users to switch from the old behaviour to the new behaviour. In that case, we can fix the p and q to more optimal values. However, this will not make everybody happy and wants more control over the language card and gives us more optimisation work. I, therefore, think there are two paths: keep the old behaviour since it is more consistent with what people expect to see or add the behaviour where people can change the p and q values.

rickstaa avatar May 08 '22 09:05 rickstaa

@kitswas For now, #1732 looks good. Let's wait on what @anuraghazra says. I will review this once more if we decide to merge it.

rickstaa avatar May 08 '22 09:05 rickstaa

Resolved conflicts and rebased this branch onto anuraghazra/github-readme-stats/master.

kitswas avatar Sep 30 '22 15:09 kitswas

Pulled latest changes from anuraghazra/github-readme-stats/master.

kitswas avatar Apr 19 '23 16:04 kitswas

Pulled latest changes from anuraghazra/github-readme-stats/master.

Thanks, I made some small changes (see 9ce9bde1fdfd40420a27cfbb7a682f96beb307be) but I think this PR is good to be merged now 👍🏻.

rickstaa avatar Apr 24 '23 13:04 rickstaa

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.01 :tada:

Comparison is base (4d1d83d) 97.31% compared to head (44b6ada) 97.32%.

:exclamation: Current head 44b6ada differs from pull request most recent head 7aa3546. Consider uploading reports for the commit 7aa3546 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1732      +/-   ##
==========================================
+ Coverage   97.31%   97.32%   +0.01%     
==========================================
  Files          24       24              
  Lines        4249     4267      +18     
  Branches      385      387       +2     
==========================================
+ Hits         4135     4153      +18     
  Misses        112      112              
  Partials        2        2              
Impacted Files Coverage Δ
api/top-langs.js 95.65% <100.00%> (+0.19%) :arrow_up:
src/fetchers/top-languages-fetcher.js 88.23% <100.00%> (+1.18%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Apr 24 '23 13:04 codecov[bot]

@rickstaa Thanks for your continuous support. I wonder if we should change the names p and q to size_weight and count_weight, respectively. p and q are not exactly intuitive.

kitswas avatar Apr 24 '23 17:04 kitswas

Users can also list by commit count and by size separately. This effectively removes ambiguity and confusion. For reference, here's how mine looks:

Top Langs By Commits

Top Langs By Size

kitswas avatar Apr 24 '23 17:04 kitswas

count_weight

Sounds like a plan! If you commit these changes, both in the code and the README, I will merge the pull request :).

rickstaa avatar Apr 24 '23 18:04 rickstaa

@rickstaa Updated.

kitswas avatar Apr 25 '23 04:04 kitswas

Great! Merged into the main. Thanks for your contribution.

rickstaa avatar Apr 25 '23 06:04 rickstaa