replayweb.page icon indicating copy to clipboard operation
replayweb.page copied to clipboard

Can't see contents of WACZ file with Reply Web.Page local

Open ivbeg opened this issue 2 years ago • 1 comments

I've created wacz file from warc.gz with latest py-warcz package 0.4.5 Original file https://cdn1.ruarxive.org/public/webcollect2022/ngo2022/cafrussia.ru/cafrussia.ru.warc.gz (179MB) Produced WACZ file https://cdn1.ruarxive.org/public/webcollect2022/ngo2022/cafrussia.ru/cafrussia.ru.wacz (179MB)

I open wacz file with Reply Web.Page and I don't see it's contents. изображение

There is no issues opening original warc.gz file.

Environment OS - WIndows 10 Product version - reply web.page release 1.5.10 installed from binary from here https://github.com/webrecorder/replayweb.page/releases/tag/v1.5.10

ivbeg avatar Mar 27 '22 06:03 ivbeg

It seems that there's two issues, the random-access search isn't actually activating correctly until a URL is entered - will take a look at that. If you enter https://cafrussia.ru into the search, then it will find the URL, but it should be showing a list automatically.

There is also no page metadata, and so it defaults to the search view. One way to fix this would be to add it directly when creating the wacz file: wacz create -f cafrussia.ru.warc.gz --url https://www.cafrussia.ru/, or to run with --detect-pages, which will then detect all the pages automatically (more experimental)

But the lack of initial search results appears to be a bug.

ikreymer avatar Apr 08 '22 07:04 ikreymer