Adrien Barbaresi
Adrien Barbaresi
@JER-CE It was actually a bug, urllib.robotparser can try to load the URL indefinitely. Thanks for the detailed code snippet, it really helps. Now it's still not working (on my...
Do you mean space before the code or space in general? Could you provide a concrete example of code block?
Yes, spacing is not necessarily preserved in code blocks, this can be improved.
Hi @Yomguithereal, I didn't know that Python could come without LZMA, I thought it was a standard package and I used it because it compresses text better. I could switch...
I checked again, usually all the packages in the stdlib are available. In some cases compression librairies are missing with Python compiled from source but it's inconsistent across systems, see...
It would also be difficult to test on Github Actions (the current CI/CD). We could also explain how to fix the problem in the docs. Let's leave the issue open...
@Yomguithereal The package is on track to remove the LZMA dependency. There will soon be no need for pickle files (justext data and XML-TEI schema).
Hi @obeone, indeed. The links were not my original focus and there are a few problems with link extraction.
Hi @felipehertzer, as with the other heuristics it's tricky, the number 7 is chosen arbitrarily, let's see if we can find a better way or another threshold.
Hi @ibestvina, this is a known issue. I'm not primarily working with these options and added them after feature requests, so the interaction between option can be patchy at times....