warc2html
warc2html copied to clipboard
Converts WARC files to static HTML
This is a great utility! It would be even greater if you offered a flag that allowed the extraction to skip specific records if they throw an error. In my...
If I have a WARC split into number files `-00000.warc.gz`, `-00001.warc.gz`, etc. How can I load these into this tool? I'm fairly ignorant to the WARC format, sorry if this...
Hello, i got an exception from an warc from archive.org? ``` Exception in thread "main" java.lang.StringIndexOutOfBoundsException: index -1, length 0 at java.base/java.lang.String.checkIndex(String.java:4563) at java.base/java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:351) at java.base/java.lang.StringBuilder.charAt(StringBuilder.java:91) at org.netpreserve.urlcanon.SemanticPreciseCanonicalizer.removeLeadingTrailingAndDuplicateChars(SemanticPreciseCanonicalizer.java:90) at org.netpreserve.urlcanon.AggressiveCanonicalizer.removeRedundantAmpersandsFromQuery(AggressiveCanonicalizer.java:100)...
Bump dependency, jwarc 0.20.0 is more lenient to parsing errors.
Hi, Can we please get some common OS packages for warc2html, in order to make it easier to install? * macOS (Homebrew) * Debian/Ubuntu (PPA) * RHEL (yum repo) *...
Suggested by Ilya