Results 371 comments of Rob Brackett

@neiljp Yeah, we sorta have a more defined way to do this now. You can add your work as a module in the https://github.com/edgi-govdata-archiving/web-monitoring-processing repo, in the `web_monitoring` folder. There’s...

Hey, @neiljp, just checking in. Any updates or anything I can help with here?

Well, this is still pretty critical. It would be lovely to get some help from someone on this, but it does need to get done.

Hey @cYph3r1337, that would be great. These days, all the diff-related code lives in the [web-monitoring-processing repo](https://github.com/edgi-govdata-archiving/web-monitoring-processing/) in the [`web_monitoring/diff` directory](https://github.com/edgi-govdata-archiving/web-monitoring-processing/tree/master/web_monitoring/diff). You can then make your differ accessible via HTTP...

Ugh, this is important documentation that I need to do!

@titaniumbones are you talking differences in the actual systems themselves or the data we are storing and making public? If the latter, that is documented here (in the `source_metadata` item):...

> There’s no pagefreezer info there yet because we do not have a consistent format to document for it yet. Same for IA, too.

Ah, sorry! Didn’t realize this was coming out of another discussion. 👍

@janakrajchadha in terms of the raw data we can get out of Versionista, that’s never been documented: - Partially because it’s always changing—we are scraping, so occasionally access to some...

> source_metadata is what we get from the source itself It‘s close, but not exactly the same. `source_metadata` doesn’t include fields that are already represented in the `page` and `version`...