gharchive.org icon indicating copy to clipboard operation
gharchive.org copied to clipboard

Missing events in data files and missing data files

Open lukaszgryglicki opened this issue 3 years ago • 0 comments

Hi, we came across many PR events being missing either due to missing data files or missing events in existing data files.

See examples:

  • https://github.com/cloudfoundry/cloud_controller_ng/pull/1197 - this PR is missing, it should be present in 2018-08-01 files, but only those PRs are in all 24 hours from 2018-08-01: 1184, 1185, 1186, 1194.
  • https://github.com/cloudfoundry/cloud_controller_ng/pull/1438 - missing in 2019-09-12. This can be due to missing data files from 2019-09-12-8 till 2019-09-13-5 (21 hours - 21 json.gz files in format http://data.gharchive.org/2019-09-1X-XX.json.gz). The only PR for this repo is 1430 in 2018-08-01.
  • https://github.com/cloudfoundry/cloud_controller_ng/pull/1688 - missing in 2020-06-10 - missing data files from 2020-06-10-12 till 2020-06-10-21 (10 hours), only those PRs are present: 1643, 1686
  • https://github.com/cloudfoundry/cloud_controller_ng/pull/1800 - missing in 2020-08-21 - missing data files from 2020-08-21-9 till 2020-08-23-15 (55 hours), no PRs for this repo are reported at all.

Then another repo:

  • 162 - 2019-08-26: PRs present: 158, 159 (missing PR event for 162).
  • 165 - 2019-08-27: PRs present: 154 (missing event for 165).
  • 166 - 2019-08-27: PRs present: 154 (missing event for 166).
  • 179 - 2019-08-28: PRs present: 154, 183 (missing event for 166).
  • 252 - 2019-09-12: PRs present: 230 (missing data files 2019-09-12-8 - 2019-09-13-5 - 22 hours). 255 - 2019-09-13: PRs present: 227, 240 (missing data files 2019-09-12-8 - 2019-09-13-5 - 22 hours).
  • 647 - 2019-10-31: PRs present: 615, 621, 638, 643, 644, 651, 652, 654 (missing event for 647).
  • 1907 - 2020-06-10: 1906 (missing data files 2020-06-10-12 - 2020-06-10-21 - 10 hours).
  • 2130 - 2020-08-22: no PRs (missing data files from 2020-08-21-9 till 2020-08-23-15 (55 hours).

This is detected by this script, example call is: ./scripts/gha_prs.sh hyperledger/aries-framework-go 2019-10-31. It reports present PRs for a given repo and day 924 GHA json.gz files).

cc @igrigorik @1010sachin

lukaszgryglicki avatar Feb 03 '21 10:02 lukaszgryglicki