replayweb.page icon indicating copy to clipboard operation
replayweb.page copied to clipboard

Support load of multiple WARC files

Open ivbeg opened this issue 2 years ago • 1 comments

Some crawlers could create multiple WARC files, it's importand if we had to upload WARC files to storages with limitation on single file size. I have a lot of archives websites splitted to 5-50 5GB WARC files each one. Is it possible to add to Reply Web.Page ability to open more than one file at once ?

ivbeg avatar Mar 27 '22 06:03 ivbeg

There is a preliminary implementation for loading a json 'manifest' which contains a list of WACZ files. Currently, support is planned for just multiple WACZ, because it is easier to load many at once due to random access. With a list of WARCs, would need to load each one to index it, but maybe that should still be supported. Can update here when there is more progress on this!

ikreymer avatar Apr 08 '22 07:04 ikreymer