pieterhartel comments

Results 13 comments of


                                            pieterhartel

anchor issue

@adbar wrote "The reason is that trailing titles at the bottom of articles are discarded during extraction". I don't think that this is the case. There is text following the...

abi-decode can't decode arguments to a constructor

I now see that this is the same as issue #7. Would it be possible to give this some priority? --pieter

.onion URLs not supported

Does this mean that with the current system scanning .onion sites is impossible?

Unable to reclaim box space in spline routing for edge... Something is probably seriously wrong.

Brilliant workaround, thanks!

Question about the title

Well, I tried to summarise what trafilatura is currently doing, could you confirm or correct please?

Question about the title

I have a dataset with over 10M home pages, of which 82K (less than 1%) contain a `@class="entry-title"`. WP is not popular in this dataset. 75K pages contain a ``,...

Error in connecting to CertStream...

I suppose the question is why is this happening, what percentage of the certificates are missed and how to avoid the error... I would be very much interested in the...

4 field consistency (non-breaking)

Done.

Inconsistencies

I am tempted but how do I debug existing or new REs?

To explain my question, I have added a parser class for Belgium as follows: ``` class WhoisBe(WhoisEntry): """Whois parser for .be domains""" regex: dict[str, str] = { "domain_name": r"Domain: *(.+)",...