Drastic Drop Off in Events After 2025-05-23
I was working with dataset and noticed that after 2025-05-23, the size of the data and number of events per day is drastically lower than it was the previous month.
Is this drop off genuine, an issue with github, or something else?
Hi @cskor
I checked the 0:00 AM json files for each day starting 2024 and confirm the drop in event count.
Here's a plot of the number of events per day (between 0:00 and 0:59 AM):
The most frequent events are Push events, and it's quite improbable that this drop is a natural product of user habits:
Most other event types have a similar drop in frequency. Calculating the Pearson correlation (for time window 2025-03 to 2025-08) between all-event count and events-by-type counts gives:
| event type | correlation |
|---|---|
| all events | 1 |
| Push | 0.96566 |
| Public | 0.884655 |
| Fork | 0.85795 |
| Member | 0.841076 |
| Gollum | 0.830705 |
| Watch | 0.803054 |
| Delete | 0.794861 |
| Create | 0.769217 |
| IssueComment | 0.757183 |
| Release | 0.741124 |
| file_bytes | 0.723839 |
| PullRequestReview | 0.622 |
| Issues | 0.583528 |
| CommitComment | 0.542668 |
| PullRequest | 0.529304 |
| PullRequestReviewComment | 0.336829 |
And indeed, the PullRequestReviewComment frequency is the least affected:
It's quite mysterious..
For comparison, here's the plot of the file size of each 0:00 json file. The drop is not as pronounced but clearly visible:
Anybody any idea what has happened?
Checking the event IDs reveals a general problem. Assuming that the event ID increases consecutively, then there are a lot of events missing in the gharchive. The following table shows, for each day's 0:00 file,
- the number of events contained in the file
- the number of missing events, according to the event IDs
- the number of gaps between event IDs (e.g. if the event IDs would be 1000, 1001, 1003, 1004, 1010, that would be 5 events and 2 gaps)
- and some statistics about the gap sizes
| date | events | missing events | gaps | gap_min | gap_max | gap_median | gap_mean |
|---|---|---|---|---|---|---|---|
| 2025-05-01 00:00:00 | 195836 | 884771 | 156829 | 2 | 364 | 5 | 6.64164 |
| 2025-05-02 00:00:00 | 188895 | 752239 | 148355 | 2 | 269 | 4 | 6.07054 |
| 2025-05-03 00:00:00 | 158714 | 550915 | 116038 | 2 | 1425 | 4 | 5.74772 |
| 2025-05-04 00:00:00 | 160056 | 339527 | 106159 | 2 | 687 | 3 | 4.1983 |
| 2025-05-05 00:00:00 | 185746 | 550467 | 135778 | 2 | 288 | 4 | 5.05418 |
| 2025-05-06 00:00:00 | 210130 | 847538 | 163885 | 2 | 1027 | 4 | 6.17155 |
| 2025-05-07 00:00:00 | 205822 | 921961 | 164479 | 2 | 509 | 5 | 6.60535 |
| 2025-05-08 00:00:00 | 202408 | 886676 | 161288 | 2 | 454 | 5 | 6.49748 |
| 2025-05-09 00:00:00 | 197907 | 853474 | 156706 | 2 | 3465 | 5 | 6.44635 |
| 2025-05-10 00:00:00 | 181928 | 545850 | 131607 | 2 | 2263 | 4 | 5.14758 |
| 2025-05-11 00:00:00 | 160884 | 343227 | 106570 | 2 | 347 | 3 | 4.22068 |
| 2025-05-12 00:00:00 | 186491 | 678571 | 143199 | 2 | 1987 | 4 | 5.73866 |
| 2025-05-13 00:00:00 | 205596 | 923651 | 164296 | 2 | 2331 | 5 | 6.62188 |
| 2025-05-14 00:00:00 | 199618 | 905315 | 159720 | 2 | 2614 | 5 | 6.66814 |
| 2025-05-15 00:00:00 | 202628 | 922655 | 161352 | 2 | 1326 | 5 | 6.71828 |
| 2025-05-16 00:00:00 | 195380 | 898317 | 156977 | 2 | 3378 | 5 | 6.72261 |
| 2025-05-17 00:00:00 | 175883 | 574874 | 130902 | 2 | 328 | 4 | 5.39164 |
| 2025-05-18 00:00:00 | 164459 | 397948 | 111128 | 2 | 318 | 3 | 4.581 |
| 2025-05-19 00:00:00 | 189520 | 693660 | 145375 | 2 | 3968 | 4 | 5.77153 |
| 2025-05-20 00:00:00 | 201295 | 908386 | 161475 | 2 | 409 | 5 | 6.62556 |
| 2025-05-21 00:00:00 | 202491 | 960081 | 163093 | 2 | 3204 | 5 | 6.88671 |
| 2025-05-22 00:00:00 | 201915 | 946826 | 162858 | 2 | 2661 | 5 | 6.81382 |
| 2025-05-23 00:00:00 | 200297 | 936153 | 161015 | 2 | 4534 | 5 | 6.81408 |
| 2025-05-24 00:00:00 | 136470 | 649038 | 101954 | 2 | 1955 | 4 | 7.366 |
| 2025-05-25 00:00:00 | 126762 | 409312 | 88292 | 2 | 1146 | 3 | 5.6359 |
| 2025-05-26 00:00:00 | 141729 | 737433 | 109301 | 2 | 1883 | 4 | 7.74682 |
| 2025-05-27 00:00:00 | 145819 | 812968 | 113030 | 2 | 2423 | 4 | 8.19251 |
| 2025-05-28 00:00:00 | 128147 | 1066013 | 103511 | 2 | 4717 | 5 | 11.2986 |
| 2025-05-29 00:00:00 | 141831 | 1005926 | 114972 | 2 | 3236 | 5 | 9.74932 |
| 2025-05-30 00:00:00 | 139802 | 1012838 | 112481 | 2 | 5156 | 5 | 10.0045 |
| 2025-05-31 00:00:00 | 136427 | 665128 | 101807 | 2 | 1737 | 4 | 7.53323 |
| 2025-06-01 00:00:00 | 126670 | 558300 | 93552 | 2 | 1881 | 4 | 6.96781 |
| 2025-06-02 00:00:00 | 142442 | 854724 | 111757 | 2 | 5245 | 5 | 8.64807 |
| 2025-06-03 00:00:00 | 146974 | 982364 | 118939 | 2 | 3289 | 5 | 9.2594 |
| 2025-06-04 00:00:00 | 145280 | 1090679 | 118793 | 2 | 6180 | 5 | 10.1813 |
| 2025-06-05 00:00:00 | 141977 | 1059661 | 115996 | 2 | 4009 | 5 | 10.1353 |
| 2025-06-06 00:00:00 | 138954 | 1042006 | 113946 | 2 | 3280 | 5 | 10.1447 |
| 2025-06-07 00:00:00 | 133768 | 703670 | 101729 | 2 | 1611 | 4 | 7.91711 |
| 2025-06-08 00:00:00 | 124182 | 477034 | 87950 | 2 | 1619 | 4 | 6.42393 |
| 2025-06-09 00:00:00 | 137566 | 736635 | 106225 | 2 | 1562 | 4 | 7.93468 |
| 2025-06-10 00:00:00 | 147513 | 1085730 | 119825 | 2 | 3249 | 5 | 10.061 |
| 2025-06-11 00:00:00 | 141886 | 1055326 | 115766 | 2 | 3135 | 5 | 10.116 |
| 2025-06-12 00:00:00 | 149995 | 1086791 | 121983 | 2 | 5647 | 5 | 9.90937 |
| 2025-06-13 00:00:00 | 141327 | 1024520 | 115822 | 2 | 2970 | 5 | 9.84565 |
| 2025-06-14 00:00:00 | 134705 | 707686 | 103309 | 2 | 2411 | 4 | 7.8502 |
| 2025-06-15 00:00:00 | 130544 | 502204 | 95118 | 2 | 1790 | 4 | 6.27981 |
| 2025-06-16 00:00:00 | 141516 | 810298 | 111282 | 2 | 2017 | 4 | 8.28149 |
| 2025-06-17 00:00:00 | 148966 | 1143279 | 122101 | 2 | 3219 | 5 | 10.3634 |
| 2025-06-18 00:00:00 | 145205 | 1108496 | 119672 | 2 | 2914 | 5 | 10.2628 |
| 2025-06-19 00:00:00 | 142647 | 1063957 | 116932 | 2 | 3700 | 5 | 10.0989 |
| 2025-06-20 00:00:00 | 144915 | 878110 | 115151 | 2 | 2588 | 5 | 8.62573 |
| 2025-06-21 00:00:00 | 135653 | 650914 | 103376 | 2 | 1854 | 4 | 7.29658 |
| 2025-06-22 00:00:00 | 127888 | 446221 | 91186 | 2 | 1668 | 4 | 5.89354 |
| 2025-06-23 00:00:00 | 139195 | 788089 | 109329 | 2 | 1922 | 4 | 8.20843 |
| 2025-06-24 00:00:00 | 145776 | 1153355 | 119367 | 2 | 3057 | 5 | 10.6623 |
| 2025-06-25 00:00:00 | 145915 | 1081156 | 119898 | 2 | 2150 | 5 | 10.0173 |
| 2025-06-26 00:00:00 | 146200 | 1056579 | 119999 | 2 | 3293 | 5 | 9.80491 |
| 2025-06-27 00:00:00 | 144965 | 1043376 | 117723 | 2 | 8101 | 5 | 9.86298 |
| 2025-06-28 00:00:00 | 133919 | 674600 | 102176 | 2 | 2250 | 4 | 7.60234 |
| 2025-06-29 00:00:00 | 125954 | 462651 | 89119 | 2 | 3331 | 4 | 6.1914 |
| 2025-06-30 00:00:00 | 139577 | 803805 | 110802 | 2 | 1990 | 5 | 8.25444 |
Here's a plot of the ratio of missing IDs (missing events divided by events)
Whatever the reason is, it seems gharchive is storing only a fraction of the actual events happening on github and this got much worse in late May.
Just checked events API myself and it delivers those ID gaps as expected. It's possible that all missing events are private and therefore not listed but that does not explain the sudden change in the timeline..
Hehe, can not stop..
and did the counting of missing IDs for every Wednesday 0:00, from 2015-01-07 to 2025-09-17
Here's the plot of the count of recorded events in each json file:
And here's the plot including the number of missing event IDs and the number of gaps between event IDs:
Interestingly, there seems to be another strange discontinuity in August 2017, where the number of missing IDs is reduced to less than half.
Looking at the plot of the missing-ratio (the number of missing events divided by number of recorded events), i begin to imagine that github switched something in their backend in August 2017 and then switched it back in May 2025:
Major real world events like microsoft buying github in 2018 (thelinuxcode.com @ 2025-01-09), or free access to private repos in 2019 (github.blog @ 2019-01-07) do not seem to be related to the plots' discontinuities. At least at first glance..
The github events API lists the SponsorshipEvent which i have never seen in any gharchive record. That might explain some missing event IDs, but sponsorship only started in mid 2019 (github.blog @ 2019-05-23).
Here is another hint: https://github.com/igrigorik/gharchive.org/issues/171 There were actual missed events because of API rate limits, but the date of fix is around February 2019. I don't actually see that in the timelines.
Hi guys, the situation seems to get worse this week... Any clue?
I believe the github event stream is broken since the outage on the 8th - the authenticated endpoint seems to be caching the 3 pages of events for a whole 10 minutes instead of the normal 1-2 seconds. The unauthenticated endpoint, ironically, still refreshes every 10 seconds. I reached out to github support, but they said they didnt hear any other customers having this issue, so I just replied with this thread. Hopefully its just some switch they forgot to turn back on after the event last week
By the way, the problem that caused the issue for this ticket is fairly straightforward -
The crawler.rb script only fetches the first page of events: (https://github.com/igrigorik/gharchive.org/blob/master/crawler/crawler.rb#L49)
The 3 pages of events change together. If you only poll for page 1, there is a high probability throughout the day you dont capture events because they will only show up on page 2 or page 3.
Here's the psuedocode I used to query github events:
ALGORITHM PollPages
INPUT:
PAGES ← [1, 2, 3]
SLEEP_MS ← 250
STATE:
etag[1..3] // last known ETags per page
lastKnownId // last processed id
// One-time init
FOR p IN PAGES:
etag[p] ← FETCH_ETAG(p) // e.g., HEAD or initial GET
LOOP FOREVER:
// Step 2a: probe page 1 with conditional GET
r1 ← GET(page=1, If-None-Match=etag[1])
IF r1.status = NOT_MODIFIED THEN
SLEEP(SLEEP_MS)
CONTINUE LOOP
ENDIF
// Step 2b: page 1 changed → update and fetch 2 & 3 in parallel
IF r1.status = OK THEN
etag[1] ← r1.etag
ENDIF
PARALLEL:
r2 ← GET(page=2, If-None-Match=etag[2])
r3 ← GET(page=3, If-None-Match=etag[3])
END PARALLEL
IF r2.status = OK THEN etag[2] ← r2.etag ENDIF
IF r3.status = OK THEN etag[3] ← r3.etag ENDIF
// Step 2c: merge and filter
items ← CONCAT(ITEMS(r1), ITEMS(r2), ITEMS(r3))
newItems ← FILTER(items, item.id > lastKnownId)
PROCESS(newItems) // emit or handle as needed
// Step 2d: advance watermark
IF newItems ≠ ∅ THEN
lastKnownId ← MAX(item.id FOR item IN newItems)
ENDIF
SLEEP(SLEEP_MS)
END LOOP
Rate limits for authenticated accounts are 5,000 requests per hour per the docs. Right now, the cached pages don’t change fast enough for you to hit that; I only use about 4,000 requests per hour since NotModified responses don’t count toward the limit. If you want to be safe, add a simple rate limit token/bucket system to make sure you never exceed the allowed rate.
During slow periods, you might only need to pull page 1. You can adjust the pseudocode so that if the last known ID is already present on page 1, you skip querying pages 2 and 3 to save on some bandwidth for unnecessary requests.
I filed a separate issue in #312 for the 100x drop-off since 2025-10-09, not sure which issue to use to track this, so chiming in here as well 😃
Indeed, @filmaj
$ du -sh 2025/*/*
1.6G 2025/01/01
1.9G 2025/01/02
2.0G 2025/01/03
1.4G 2025/01/04
1.4G 2025/01/05
2.6G 2025/01/06
2.2G 2025/01/07
2.5G 2025/01/08
2.3G 2025/01/09
2.4G 2025/01/10
1.6G 2025/01/11
1.5G 2025/01/12
2.9G 2025/01/13
2.5G 2025/01/14
2.5G 2025/01/15
2.5G 2025/01/16
2.4G 2025/01/17
1.5G 2025/01/18
1.5G 2025/01/19
2.8G 2025/01/20
2.6G 2025/01/21
2.6G 2025/01/22
2.5G 2025/01/23
2.4G 2025/01/24
1.5G 2025/01/25
1.6G 2025/01/26
3.0G 2025/01/27
2.5G 2025/01/28
2.4G 2025/01/29
2.3G 2025/01/30
2.2G 2025/01/31
1.8G 2025/02/01
1.5G 2025/02/02
2.9G 2025/02/03
2.6G 2025/02/04
2.6G 2025/02/05
2.5G 2025/02/06
2.5G 2025/02/07
1.7G 2025/02/08
1.7G 2025/02/09
3.1G 2025/02/10
2.8G 2025/02/11
2.7G 2025/02/12
2.6G 2025/02/13
2.5G 2025/02/14
1.7G 2025/02/15
1.6G 2025/02/16
3.0G 2025/02/17
2.6G 2025/02/18
2.7G 2025/02/19
2.7G 2025/02/20
2.5G 2025/02/21
1.9G 2025/02/22
1.6G 2025/02/23
3.1G 2025/02/24
2.7G 2025/02/25
2.6G 2025/02/26
2.7G 2025/02/27
2.5G 2025/02/28
2.0G 2025/03/01
1.7G 2025/03/02
3.2G 2025/03/03
2.7G 2025/03/04
2.6G 2025/03/05
2.6G 2025/03/06
2.6G 2025/03/07
1.7G 2025/03/08
1.7G 2025/03/09
3.2G 2025/03/10
2.8G 2025/03/11
2.8G 2025/03/12
2.7G 2025/03/13
2.5G 2025/03/14
1.7G 2025/03/15
1.7G 2025/03/16
3.1G 2025/03/17
2.7G 2025/03/18
2.6G 2025/03/19
2.7G 2025/03/20
2.6G 2025/03/21
1.7G 2025/03/22
1.7G 2025/03/23
3.2G 2025/03/24
2.9G 2025/03/25
2.7G 2025/03/26
2.7G 2025/03/27
2.5G 2025/03/28
1.7G 2025/03/29
1.7G 2025/03/30
3.1G 2025/03/31
3.1G 2025/04/01
2.8G 2025/04/02
2.7G 2025/04/03
2.5G 2025/04/04
1.7G 2025/04/05
1.7G 2025/04/06
3.2G 2025/04/07
2.8G 2025/04/08
2.7G 2025/04/09
2.7G 2025/04/10
2.5G 2025/04/11
1.6G 2025/04/12
1.6G 2025/04/13
3.0G 2025/04/14
2.6G 2025/04/15
2.5G 2025/04/16
2.4G 2025/04/17
2.2G 2025/04/18
1.6G 2025/04/19
1.6G 2025/04/20
2.7G 2025/04/21
2.7G 2025/04/22
2.6G 2025/04/23
2.8G 2025/04/24
2.5G 2025/04/25
1.7G 2025/04/26
1.7G 2025/04/27
3.0G 2025/04/28
2.7G 2025/04/29
2.7G 2025/04/30
2.6G 2025/05/01
2.2G 2025/05/02
1.7G 2025/05/03
1.7G 2025/05/04
3.1G 2025/05/05
2.8G 2025/05/06
2.7G 2025/05/07
2.7G 2025/05/08
2.5G 2025/05/09
1.7G 2025/05/10
1.7G 2025/05/11
3.1G 2025/05/12
2.8G 2025/05/13
2.7G 2025/05/14
2.7G 2025/05/15
2.6G 2025/05/16
1.8G 2025/05/17
1.7G 2025/05/18
3.1G 2025/05/19
2.7G 2025/05/20
2.8G 2025/05/21
2.7G 2025/05/22
2.4G 2025/05/23
1.3G 2025/05/24
1.3G 2025/05/25
1.8G 2025/05/26
1.7G 2025/05/27
1.7G 2025/05/28
1.7G 2025/05/29
1.8G 2025/05/30
1.3G 2025/05/31
1.5G 2025/06/01
2.1G 2025/06/02
1.8G 2025/06/03
1.9G 2025/06/04
1.8G 2025/06/05
1.8G 2025/06/06
1.4G 2025/06/07
1.3G 2025/06/08
2.0G 2025/06/09
2.0G 2025/06/10
1.9G 2025/06/11
1.7G 2025/06/12
1.8G 2025/06/13
1.6G 2025/06/14
1.7G 2025/06/15
2.2G 2025/06/16
1.9G 2025/06/17
1.9G 2025/06/18
1.9G 2025/06/19
1.7G 2025/06/20
1.4G 2025/06/21
1.3G 2025/06/22
2.2G 2025/06/23
2.7G 2025/06/24
2.1G 2025/06/25
1.8G 2025/06/26
1.9G 2025/06/27
1.4G 2025/06/28
1.3G 2025/06/29
2.0G 2025/06/30
2.1G 2025/07/01
1.9G 2025/07/02
1.8G 2025/07/03
1.7G 2025/07/04
1.3G 2025/07/05
1.3G 2025/07/06
2.0G 2025/07/07
1.9G 2025/07/08
1.8G 2025/07/09
1.8G 2025/07/10
1.8G 2025/07/11
1.4G 2025/07/12
1.3G 2025/07/13
2.0G 2025/07/14
2.0G 2025/07/15
1.9G 2025/07/16
2.0G 2025/07/17
2.0G 2025/07/18
1.4G 2025/07/19
1.4G 2025/07/20
2.1G 2025/07/21
2.1G 2025/07/22
1.9G 2025/07/23
1.9G 2025/07/24
1.8G 2025/07/25
1.4G 2025/07/26
1.4G 2025/07/27
2.1G 2025/07/28
2.0G 2025/07/29
1.9G 2025/07/30
1.9G 2025/07/31
2.1G 2025/08/01
1.4G 2025/08/02
1.4G 2025/08/03
2.1G 2025/08/04
2.1G 2025/08/05
2.0G 2025/08/06
2.1G 2025/08/07
2.1G 2025/08/08
1.6G 2025/08/09
1.7G 2025/08/10
2.4G 2025/08/11
2.3G 2025/08/12
2.1G 2025/08/13
2.0G 2025/08/14
1.9G 2025/08/15
1.5G 2025/08/16
1.6G 2025/08/17
2.3G 2025/08/18
2.1G 2025/08/19
2.0G 2025/08/20
2.1G 2025/08/21
2.0G 2025/08/22
1.4G 2025/08/23
1.5G 2025/08/24
2.2G 2025/08/25
2.1G 2025/08/26
2.1G 2025/08/27
2.0G 2025/08/28
1.9G 2025/08/29
1.5G 2025/08/30
1.5G 2025/08/31
2.2G 2025/09/01
2.2G 2025/09/02
2.0G 2025/09/03
2.1G 2025/09/04
1.9G 2025/09/05
1.5G 2025/09/06
1.5G 2025/09/07
1.7G 2025/09/08
2.1G 2025/09/09
2.2G 2025/09/10
1.9G 2025/09/11
1.9G 2025/09/12
1.6G 2025/09/13
1.5G 2025/09/14
2.2G 2025/09/15
2.1G 2025/09/16
2.0G 2025/09/17
1.9G 2025/09/18
1.9G 2025/09/19
1.4G 2025/09/20
1.5G 2025/09/21
2.4G 2025/09/22
2.1G 2025/09/23
2.0G 2025/09/24
1.9G 2025/09/25
1.9G 2025/09/26
1.5G 2025/09/27
1.4G 2025/09/28
2.1G 2025/09/29
2.0G 2025/09/30
2.3G 2025/10/01
2.0G 2025/10/02
1.9G 2025/10/03
1.5G 2025/10/04
1.5G 2025/10/05
2.4G 2025/10/06
2.0G 2025/10/07
1.5G 2025/10/08
5.1M 2025/10/09
4.7M 2025/10/10
3.2M 2025/10/11
3.3M 2025/10/12
2.1M 2025/10/13
I discovered yesterday that the number of star events has started to recover, but it's still inconsistent with the official data. Has anyone encountered the same situation?