Viktor issues

Results 73 issues of


                                            Viktor

GT6a: Create job that exports samples of rendered DOM HTML

enhancement

nlnet25

GT6: Improve detection of ads and popovers

This can be done by using a headless browser to fetch the root document, and analyze the rendered DOM. The existing junk-detection solution works decently well, but uses static HTML...

enhancement

nlnet25

GT5b: Build GUI and API toggles for toggling the filter

enhancement

nlnet25

GT5a: Implement filtering logic

enhancement

nlnet25

GT5: NSFW filtering

Implement a basic "safe search" filter for removing NSFW results. A naive bayesian filter or something along those lines probably goes a long way, there are also "bad website"-lists that...

enhancement

nlnet25

GT4b: Create crawler specialization for github using their API

enhancement

nlnet25

GT4a: Create crawler specialization that permits limited indexing of generic git forges

nlnet25

GT4: Indexing git forges

The crawler currently avoids git forges as crawling them is very resource intensive for the remote server. A crawler specialization that understands to stay on the main branch and e.g....

enhancement

nlnet25

GT3: Index additional file formats

Add capability to index PDF files (when they have text data, OCR is out of scope).

enhancement

nlnet25

GT2b: Integrate common crawl data into the crawler, including some configuration of when and if to use CC

enhancement

nlnet25