Max Dallabetta
Max Dallabetta
Currently the `Precomputed` class consists of very overloaded names while itself yielding not much information about it's nature through the class name: ``` python class Precomputed: html: str doc: lxml.html.HtmlElement...
### Problem statement The publisher used in the documentation about [how to add a publisher](https://github.com/flairNLP/fundus/blob/master/docs/how_to_add_a_publisher.md) doesn't use a summary or subheadlines. I think, the latest contributions kinda support my suspicion,...
The purpose of this PR is to unify the versioning mechanics of `ParserProxy` and the `BaseParser` in a new interface called `parse` within `ParserProxy`. ```python #prev extraction = parser(crawl_date).parse(html, ...)...
This PR deprecates `get_value_by_key_path` class method of `LinkedDataMapping` and replaces all existing occasions with `xpath_search`. To talk about efficiency I provide some additional runtime comparison and a script to reproduce....
This PR introduces functionality to benchmark publishers using the CC-NEWS dataset. The benchmarking process involves retrieving HTML and articles at specified intervals (daily, weekly, monthly, etc.) from the CC-NEWS dataset,...