Viktor

Results 73 issues of Viktor

This can be done by using a headless browser to fetch the root document, and analyze the rendered DOM. The existing junk-detection solution works decently well, but uses static HTML...

enhancement
nlnet25

Implement a basic "safe search" filter for removing NSFW results. A naive bayesian filter or something along those lines probably goes a long way, there are also "bad website"-lists that...

enhancement
nlnet25

The crawler currently avoids git forges as crawling them is very resource intensive for the remote server. A crawler specialization that understands to stay on the main branch and e.g....

enhancement
nlnet25

Add capability to index PDF files (when they have text data, OCR is out of scope).

enhancement
nlnet25