Kristinn Sigurðsson
Kristinn Sigurðsson
Invoking that Toethread's [kill()](https://github.com/internetarchive/heritrix3/blob/master/engine/src/main/java/org/archive/crawler/framework/ToeThread.java#L348) method should abort it fairly cleanly. **Edit:** Best invoked via `killThread(int threadNumber, boolean replace)`. `InterruptibleCharSequence` should respond to the interrupt by raising a RuntimeException, ending the...
Hmm, looking into this a bit closer, this may actually be a bug in the webarchive-commons [`org.archive.io.GenerationFileHandler.publish()`](https://github.com/iipc/webarchive-commons/blob/master/src/main/java/org/archive/io/GenerationFileHandler.java#L162) ``` ((Preformatter)f).preformat(record); super.publish(record); ``` Seems that both lines ultimately invoke `NonFatalErrorFormatter.format()` but `publish()`...
Also, I've confirmed that this bug was introduced between 3.0.0 and 3.1.0-RC1. It first shows up in our 2011-02 crawl which is when we switched from 3.0.0 to 3.1.0-RC1.
I've run a variant of ExtractorJS for years, that lets me filter out the links it discovers using a set of regular expressions. These are applied **before** the links are...
Wouldn't it make more sense to display an interstitial page explaining the issue and offering a link to the partial content? If we just present the partial content without any...
No, at least not when viewing them as embedded resources. Only when they are accessed directly.
Since BDB is mostly used by new users, having it be a separate plug-in kind of defeats the purpose.
@ikreymer In the context of this issue 'session persistence' is persisting a session across server reboots. I don't see how that is relevant to the (very real but seemingly unrelated)...
Somewhat relevant to this discussion, see issue #35
I'm OK with the current draft. The only point I'd question is > All classes and methods should have Javadoc comments. That is a bit on the optimistic side. Perhaps...