New top language algorithm implementation
Old Card
New Card
The tests have been updated.
I am not very familiar with Javascript so I made only minimal changes.
Some refactoring might be required.
Implements #1600
@kitswas is attempting to deploy a commit to the github readme stats Team on Vercel.
A member of the Team first needs to authorize it.
There was an error in my earlier implementation.
See the differences below:
Preview Deployment URL - https://github-readme-stats-l34h9pi63-kitswas.vercel.app
@rickstaa whats your thoughts on this?
@anuraghazra, I agree with @kitswas that the language counting algorithm could be improved. Due to some pressing deadlines, I did not yet have time to go deep into the proposed calculation. I will therefore give you my first thoughts.
From my first impression, the y variable (i.e. the number of repositories) influences the percentages. By running an optimisation on the p and q factors, we might get better results, but I'm afraid that judging the accuracy of the result might be very subjective. Currently, I am inclined to go with the old behaviour since it is similar to what GitHub uses in their repo language overview (i.e. https://github.com/github/linguist/). I, however, need to dive in deeper to give a detailed conclusion.
From my first impression, the
yvariable (i.e. the number of repositories) influences the percentages. By running an optimisation on thepandqfactors, we might get better results, but I'm afraid that judging the accuracy of the result might be very subjective. Currently, I am inclined to go with the old behaviour since it is similar to what GitHub uses in their repo language overview (i.e. https://github.com/github/linguist/). I, however, need to dive in deeper to give a detailed conclusion.
The influence of y is felt only when it is greater than 1, i.e. when a language appears in multiple repositories.
The numbers p and q could be optimised. Allowing the users to specify these might be a possibility.
The lower the exponents, the more stable the percentages are.
Without diving deeper into the formula. If we decide to implement this option the following would make sense to me:
- Rewrite the
k = x^p * y^qin a way that allows us to disable the new behaviour by default. - By default disabling the influence of
y. - Giving users the ability to change the
pandqvalues. - Documenting this new behaviour in the README.
Please be aware that I did not do any research on metrics that could give a better 'used languages' representation. I, therefore, can only comment on the metric you proposed in #1732 and #1600. It could be that there are better statistical indexes for solving the problem. Nevertheless, I don't think we need to overthink this since your metric does what it is supposed to do.
The new commits -
- [x] Rewrite the program in a way that disables the new behaviour by default.
- [x] By default disable the influence of
y. - [x] Give users the ability to change the
pandqvalues - [x] Document this new behaviour in the README.
The preview deployment URL is https://github-readme-stats-7vlsc5n0x-kitswas.vercel.app
This will prevent users from seeing their lang-cards suddenly change percentages if and when this is deployed.
@kitswas, thanks a lot for creating the new commits. It shows you are devoted to improving this project! I think people like you make a strong community found behind this project! 🚀
I, however, have to say I'm still not 100% on board with this. When I created https://github.com/anuraghazra/github-readme-stats/pull/1732#issuecomment-1120180189 I overlooked the case where p is 0 and q is 1. I don't think this version makes sense with the conception people might have with the language card. Namely, that is similar to the one shown by GitHub. I, however, understand that I'm also biased since I'm happy with the old behaviour and, therefore, likely would not use #1732 myself. I changed #1600 to a proposal so we can get more feedback from the community.
@anuraghazra, what do you think? Another way to implement this would be to create a switching variable rank_method, allowing users to switch from the old behaviour to the new behaviour. In that case, we can fix the p and q to more optimal values. However, this will not make everybody happy and wants more control over the language card and gives us more optimisation work. I, therefore, think there are two paths: keep the old behaviour since it is more consistent with what people expect to see or add the behaviour where people can change the p and q values.
@kitswas For now, #1732 looks good. Let's wait on what @anuraghazra says. I will review this once more if we decide to merge it.
Resolved conflicts and rebased this branch onto anuraghazra/github-readme-stats/master.
Pulled latest changes from anuraghazra/github-readme-stats/master.
Pulled latest changes from anuraghazra/github-readme-stats/master.
Thanks, I made some small changes (see 9ce9bde1fdfd40420a27cfbb7a682f96beb307be) but I think this PR is good to be merged now 👍🏻.
Codecov Report
Patch coverage: 100.00% and project coverage change: +0.01 :tada:
Comparison is base (
4d1d83d) 97.31% compared to head (44b6ada) 97.32%.
:exclamation: Current head 44b6ada differs from pull request most recent head 7aa3546. Consider uploading reports for the commit 7aa3546 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #1732 +/- ##
==========================================
+ Coverage 97.31% 97.32% +0.01%
==========================================
Files 24 24
Lines 4249 4267 +18
Branches 385 387 +2
==========================================
+ Hits 4135 4153 +18
Misses 112 112
Partials 2 2
| Impacted Files | Coverage Δ | |
|---|---|---|
| api/top-langs.js | 95.65% <100.00%> (+0.19%) |
:arrow_up: |
| src/fetchers/top-languages-fetcher.js | 88.23% <100.00%> (+1.18%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@rickstaa Thanks for your continuous support.
I wonder if we should change the names p and q to size_weight and count_weight, respectively.
p and q are not exactly intuitive.
Users can also list by commit count and by size separately. This effectively removes ambiguity and confusion. For reference, here's how mine looks:
count_weight
Sounds like a plan! If you commit these changes, both in the code and the README, I will merge the pull request :).
@rickstaa Updated.
Great! Merged into the main. Thanks for your contribution.