Janak Raj Chadha
Janak Raj Chadha
@danielballan @titaniumbones Can either one of you redirect me to the place where a similar thing has been done for Versionista (if it exists)?
> @titaniumbones are you talking differences in the actual systems themselves or the data we are storing and making public? If the latter, that is documented here (in the source_metadata...
@Mr0grog Oh, that was a little confusing earlier because of the term `source_metadata` being used for different things. Thanks for the clarification! I was just adding the information in the...
@titaniumbones @mhucka @suchthis @danielballan @Mr0grog I've documented the data format of the different sources and I've also added a table for differences between them. A few fields don't have a...
@danielballan Should this be closed or should we keep this open as I still have to add a little more information to the document?
### PageFreezer - `Data`: - `Depth`: - `TaskId` : - `Url0` : - `Url1`: - `UrlType`: - `Writeflag`: ### Versionista - `diffWithPreviousDate` : - `diffWithFirstDate` :
Thanks a lot @Mr0grog! The date fields are ambiguous. >hash is missing (I think it got accidentally combined with filePath above) Yeah, I probably mixed this up as the hash...
The `hash` and `path` of the diff are in a single object. The version `hash` and `filePath` aren't. I think I confused those two. Apologies.
@Mr0grog I think Internet Archive and Versionista have been well documented here. There are a few fields missing in the PageFreezer part and I was hoping that we could bring...
@daas-ankur-shukla - Well, this adds a whole new dimension to the problem. This may be extremely helpful in the process of creating a training dataset.