warc2html icon indicating copy to clipboard operation
warc2html copied to clipboard

Converts WARC files to static HTML

Results 6 warc2html issues
Sort by recently updated
recently updated
newest added

This is a great utility! It would be even greater if you offered a flag that allowed the extraction to skip specific records if they throw an error. In my...

If I have a WARC split into number files `-00000.warc.gz`, `-00001.warc.gz`, etc. How can I load these into this tool? I'm fairly ignorant to the WARC format, sorry if this...

Hello, i got an exception from an warc from archive.org? ``` Exception in thread "main" java.lang.StringIndexOutOfBoundsException: index -1, length 0 at java.base/java.lang.String.checkIndex(String.java:4563) at java.base/java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:351) at java.base/java.lang.StringBuilder.charAt(StringBuilder.java:91) at org.netpreserve.urlcanon.SemanticPreciseCanonicalizer.removeLeadingTrailingAndDuplicateChars(SemanticPreciseCanonicalizer.java:90) at org.netpreserve.urlcanon.AggressiveCanonicalizer.removeRedundantAmpersandsFromQuery(AggressiveCanonicalizer.java:100)...

Bump dependency, jwarc 0.20.0 is more lenient to parsing errors.

Hi, Can we please get some common OS packages for warc2html, in order to make it easier to install? * macOS (Homebrew) * Debian/Ubuntu (PPA) * RHEL (yum repo) *...