ArchiveBot icon indicating copy to clipboard operation
ArchiveBot copied to clipboard

Viewer does not handle truncated job ID collisions

Open JustAnotherArchivist opened this issue 6 years ago • 0 comments

Example: https://archive.fart.website/archivebot/viewer/job/82p8b This causes the search for "leoschmid.blogspot.com" to not return any results.

There should be many, many more examples of this. According to the birthday paradox, the collision probability is 50 % for about 9155 jobs, and we're at over 115k completed jobs by now.

Possible fixes:

  • Associate multiple URLs with a (truncated) job ID.
  • Replace the truncated job IDs with something more unique. Since the full job ID is not (easily) accessible, the best way without having to reprocess the entire archive would probably be to use YYYYMMDD-HHMMSS-<truncatedjobid> as the ID.

JustAnotherArchivist avatar May 01 '19 11:05 JustAnotherArchivist