Rob Brackett
Rob Brackett
@neiljp Yeah, we sorta have a more defined way to do this now. You can add your work as a module in the https://github.com/edgi-govdata-archiving/web-monitoring-processing repo, in the `web_monitoring` folder. There’s...
Hey, @neiljp, just checking in. Any updates or anything I can help with here?
Well, this is still pretty critical. It would be lovely to get some help from someone on this, but it does need to get done.
Hey @cYph3r1337, that would be great. These days, all the diff-related code lives in the [web-monitoring-processing repo](https://github.com/edgi-govdata-archiving/web-monitoring-processing/) in the [`web_monitoring/diff` directory](https://github.com/edgi-govdata-archiving/web-monitoring-processing/tree/master/web_monitoring/diff). You can then make your differ accessible via HTTP...
Ugh, this is important documentation that I need to do!
@titaniumbones are you talking differences in the actual systems themselves or the data we are storing and making public? If the latter, that is documented here (in the `source_metadata` item):...
> There’s no pagefreezer info there yet because we do not have a consistent format to document for it yet. Same for IA, too.
Ah, sorry! Didn’t realize this was coming out of another discussion. 👍
@janakrajchadha in terms of the raw data we can get out of Versionista, that’s never been documented: - Partially because it’s always changing—we are scraping, so occasionally access to some...
> source_metadata is what we get from the source itself It‘s close, but not exactly the same. `source_metadata` doesn’t include fields that are already represented in the `page` and `version`...