Aécio Santos
Aécio Santos
@1130695 boolean_operator in this case is how you combine the nested classifiers. You could nest ANY classifier like `title_regex`, `url_regex`, `weka`, and even the `regex` classifier, which accepts another boolean_operator...
This is not supported yet. Currently, the server mode is able to start only two crawl modes (*deep crawl*, and *focused crawl*) which use the `ache.yml` file configured during start-up,...
Correct, if you change these files and rebuild it should work (you can rebuild the docker image as well). Regarding multiple `startCrawl`s in the same port/process, it not possible at...
Thanks @jpmantuano ! I'll review it as soon as possible.
I agree, it would be a nice feature. Can you give more details? Would you like to see the cached HTML source code, or maybe the cached HTML rendered in...
Is there any advantage of compiling and releasing the artifacts for the upcoming version `1.3` using Java 11 (as opposed to keeping it Java 8)? If not, what about merging...
+1 to moving this PR to 1.4. Then, in the release notes for 1.3 we can announce that 1.3 is planned to be the last release that supports Java 8.
Do you mean storing the WARC record of each URL in a single file? No. But you could try to set the maximum size (in bytes) for each file using:...
No, it is not planed, but it could be included.
Hey @anudeepti2004 ! Nobody is working on this yet. A thread makes sense. What I had in mind is: whenever the file rotation happens (the current file is closed, and...