py-wacz icon indicating copy to clipboard operation
py-wacz copied to clipboard

Results 16 py-wacz issues
Sort by recently updated
recently updated
newest added

The read me file, as far as I can see, focuses on the command line. I have a dozen WACZ files created to archive Facebook posts of political leaders during...

question

I'm trying to create a wacz from a warc.gz file. I want it to detect pages and create a full text index. This is my command: `python3 -m wacz create...

When I try to validate a .wacz file using PowerShell in Windows 10 it fails to discover the .warc.gz file in the archive folder and therefore fails to validate the...

The documentation via the `wacz --help` command is far too brief. Mention all options so there is no need to always come back to this repository to consult documentation.

enhancement

This proposed new `py-wacz` command allows you to generate a CDXJ file where the filenames and offsets refer to the WACZ itself rather than the WARC files within. The idea...

I'm not sure if this is a feature request or just a request for clarification, but I'm looking for a canonical way to generate a WACZ file from multiple WARC...

When creating WACZ files from WARCS, if the WARC file name ends with `.warc`, but it is a gzipped, then rename the file to `.warc.gz`, so that it aligns with...

It would be helpful to have an iterator that walks through all the WARC records of all the WARC files in a WACZ file, treating it externally like a regular...

In the README under the `Validate` header, the instructions state that a WACZ file can be validated with `wacz validate myfile.wacz`. Trying this in the latest release or from the...

A browsertrix crawl usually contains all the information required for a wacz to be created, especially text and pages metadata is already present. Is it possible to use that data...