Added lxml_html_clean and updated pdfplumber dependencies
The LXML project has separated its HTML cleaner into its own project, breaking the current build. This change updates the required packages list in setup.py to reflect that change. Also, for some reason, the build didn't finish as expected unless I updated the pdfplumber dependency to 0.11.
Thanks for this @koddas! Looks like some tests are failing for unrelated reasons, I'll take a look at those soon
@GjjvdBurg No worries. Is there a way to run the tests locally? The readme isn't that informative on the matter :)
Well, this sucks... I managed to to commit some changes I made to the wrong branch. It's totally fine if you reject the lastest commit, it was supposed to be a new PR after the current PR has been approved.
Thanks for the changes @koddas! I'm happy to merge this with the addition of the DiVa provider, but the tests are currently failing because of formatting issues, and the test_diva_2 test fails as well (see comment). Formatting configuration can be found in the .pre-commit.yaml file. Don't worry about the other tests, I can fix those later after merging this in. Thanks for your help!
Thanks @koddas, merged!