githut icon indicating copy to clipboard operation
githut copied to clipboard

Discrepancies between Github's `languages.csv` versus the githut data source

Open danielzayas opened this issue 9 months ago • 1 comments

Context

https://danielzayas.github.io/language_trends is a small page powered by the public https://github.com/danielzayas/language_trends repo. The languages.csv data through 2024 Q3 was sourced from https://github.com/github/innovationgraph/blob/main/data/languages.csv on 2024-03-23 around 11pm PT.

language_trends

Data

languages.csv data through 2024 Q3 was sourced from https://github.com/github/innovationgraph/blob/main/data/languages.csv on 2024-03-23 around 11pm PT.

I also create a new Issue https://github.com/github/innovationgraph/issues/47 to as for more recent data (2024 Q4, 2025 Q1, etc.).

Acknowledgements

  1. Github for publishing the CSV. Y'all should really improve the data visualization on your https://innovationgraph.github.com/global-metrics/programming-languages page though.
  2. @madnight for creating a beautiful UI under the AGPL 3.0 license at https://github.com/madnight/githut, but sadly the last quarter in the data source is 2024 Q1.

Question

Filterting https://madnight.github.io/githut/#/pushes/2024/1, which is powered by https://github.com/madnight/githut, for "PUSHES" through 2024 Q1 tells a very different story about language trends. For example, consider 2024 Q1. Github's languages.csv has JavaScript at 18% of pushes versus @madnight's data source has JavaScript at 11% of pushes. Why the large discrepancy @madnight ?:

Screenshot 2025-03-23 at 10 53 35 PM

danielzayas avatar Mar 24 '25 06:03 danielzayas

Why the large discrepancy @madnight ?:

@danielzayas my dataset is not based on the CSVs you have linked, but from Goolge BigQuery (public github dataset). In addition to that I use a BOT filter https://github.com/madnight/githut/blob/master/scripts/query.js#L62

FROM ${tables} WHERE NOT LOWER(actor.login) LIKE "%bot%") a

Bots like dependabot, which generate a large number of pushes, are not included in my graphs.

madnight avatar Mar 24 '25 08:03 madnight