Mat Kelly

Results 550 comments of Mat Kelly

I attempted to update to the latest Heritrix and using the JDK 11 zulu16.30.19-ca-jdk16.0.1-macosx_aarch64 but again, the files are too large for GitHub. There should be a way to remotely...

A related aside, the indexing procedure in OpenWayback uses the filename of the WARC as part of the basis of the CDX by default. Newer versions of Heritrix compress using...

A new version was released on August 3, 2021: https://github.com/internetarchive/heritrix3/releases/tag/3.4.0-20210803

An alternative to bundling Heritrix might be to use the submodule approach and provide a reference in WAIL to the latest Heritrix and tweak as needed for WAIL or pull...

The submodule approach would work with Heritrix, but the issue is not the size of Heritrix but the size of the JDK that needs to be bundled for Heritrix to...

This is a detail of packaging for release. - [ ] Add instructions for development for pulling in a recent version of Java. - [ ] Release a version that...

In the `heritrix-2022` branch I have added the latest Heritrix binary and a more recent JDK as required. The JDK does NOT include a "modules" file at 120+ MB, which...

Attempting to replicate the current state of the `heritrix-2022` branch produces an issue with architecture when run on x86_64 (Mac Intel). Original prototyping was on arm64 (Apple Silicon). To remedy...

After detecting architecture and serving the respective `modules` file, an x86_64 machine still complains about the architecture based on the `java` binary. ``` zsh: bad CPU type in executable: /Applications/WAIL.app/bundledApps/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home//bin/java...

- [ ] Add checksum verification for JVM download. Also, some bash logic for detecting platform and pulling in the respective platform's modules file is present in the `heritrix-2022-javafetch` branch.