Ilya Grigorik

Results 296 comments of Ilya Grigorik
trafficstars

> @igrigorik We could consider updating the crawler to fetch these related events whenever it encounters an issue, to attempt to preserve the historical data around issues better, but I...

This one has me stumped. :worried: @annafil can you think of any reason why this may happen? For example, any reason why the events API may report a different email...

> @igrigorik just in case, do you know offhand the versions of the dependencies you're using on the server that runs the crawler? ``` igrigorik@worker:~/githubarchive.org/crawler$ cat Gemfile.lock GIT remote: git://github.com/eventmachine/eventmachine.git...

![image](https://user-images.githubusercontent.com/10652/37695570-981f5554-2c8d-11e8-8baa-fadd611567df.png) --- First off, impressive detective work here — big thanks to you both. I think I know what's going on here. - We started hashing emails on May 27th,...

@annafil if you're up for it, that would be a great help! I believe you should already have access to the GCP project and the GCE instance where the crawler...

Assuming the data is there, backfilling would be a non-trivial exercise. Not impossible, but non-trivial.. We ran a large transform/refactor ~1.5 years back (see https://github.com/igrigorik/githubarchive.org/issues/112) and https://github.com/arfon/gh_archive_parser/ — perhaps that...

Hmm, didn't realize that was split into a different endpoint.. doh! To answer your question: yes, it would be great to track gists. In terms of code, if the actual...

Grr, it's frustrating that they're now under a separate stream and schema.. This means we can't merge them with other archives and need separate storage + BigQuery pipelines. @briandoll any...

> I can't host a streaming archive like that myself because I don't have a machine with 100% uptime or likely even the needed amount of storage, despite it being...

@za3k no progress on this end.. stretched thin with other projects.