[Feature]: Needs similar unique warc filenamepattern implemented as in browsertrix

Open tuehlarsen opened this issue 1 year ago • 1 comments

Context

When you extract the warc file from wacz it allways has the warc name: data.warc.gz It should be name similar unique way as in browsertrix.

What change would you like to see?

see above

Requirements

No response

Todo

No response

Jun 26 '24 13:06 tuehlarsen

The same issue still applies.

I tried to complement Browsertrix Cloud-collections with downloaded/then uploaded archiveweb.page crawls and will get files named data.warc instead of original WARC-names or files containing parts of the original name (checked when downloaded initially from archiveweb.page as well as from Browsertrix Cloud as multi-WACS).

May 12 '25 13:05 Klindten