Viktor
Viktor
The first result is to a broken URL, seems like mangled URLencoding. It's likely on the Marginalia end since wikipedia is sideloaded. Link is to https://en.wikipedia.org/wiki/Tf%2013idf
Seems like a construction error. Compare `ls -la` vs `du -h`. This appears to be from within the btree index. It doesn't actually consume any space due to how the...
On two separate occasions now the crawler has discovered it's not permitted to crawl in www.cambridgeclarion.org, it's reported as being blocked by robots.txt. The webmaster hasn't changed robots.txt, and re-crawling...
The front-end parts of the service is currently rendered with the unmaintained Spark framework, using Mustache templates, which are more than a bit awkward and fairly slow to render as...
This query does not yield the expected results. If you drop the 'the' prefix, it gets better as it selects a more appropriate n-gram that gets matched with the title....
The crawler is currently looking at a bundled file `ip-banned-cidr.txt` to IP-block certain websites. This is a historical artifact that, if we even keep it, should be a configurable datafile...
This will help expand the link graph between blogs.
Seems to mostly give irrelevant results. May be down to a lack of relevant documents, perhaps?
The Spark framework that is used in some HTTP services is no longer maintained, and needs to be replaced with something else. This is an experiment evaluating the merits of...
Probably should be implemented with [PDFbox](https://pdfbox.apache.org/).