Felipe Hertzer comments

Results 21 comments of


                                            Felipe Hertzer

Extend test coverage for json_metadata functions

Hey, sure I will check it ASAP.

Extend test coverage for json_metadata functions

Hi @adbar, sure I will check it.

Parse JSON-LD information and write heuristics to decide where to draw info from

I started the development of JSON-LD parse, the results are good, it will solve half of the author problems that we have.

List of smaller extraction bugs (text & metadata)

Hey @adbar I'm having problem with a few publications like huffpost where it is not extracting the metadata correctly. But, if I change the line bellow to `tree = fromstring(htmlobject.encode('utf8'),...

Extract content from formats other than HTML: PDF, EPUB?

I think it's a good idea to create a function for extracting pdf content, I have several sites that return PDF's to me. Do you have an idea how to...

Extract content from formats other than HTML: PDF, EPUB?

Sounds good for me, it would be good to run some tests to predict what we need to have on the roadmap

Feature Request - Total hits for each group

I mean, the number of hits for each brand. I want to know how many hits each group has. group_key brand 1 - 10hits group_key brand 2 - 30hits group_key...

Change license to Apache 2.0

Approved

Extract more text

Hey @adbar, I have a similar problem, but with the site [Stuff](https://urlis.net/qqgfdyra), it is only getting half of the content, because they are using the class 'stuff-article', which is very...

Extract more text

@adbar I tested the ```ends-with``` and LXML seems to do not support it, do you want me to include the ```contains(@class, "-article")```?