Jérémy Benoist comments

Results 394 comments of


                                            Jérémy Benoist

Not able to get the full content

I was wondering if you are using Graby to retrieve content or php-readability directly? If you are using Graby, the best solution might be to create a dedicated site config...

Unexpected title cleaning

I agree it could be a legit content but I guess that in most cases, the text before `: ` is often the website name. It's here since the beginning:...

Multi run dates

You should propose this PR to the @sepehr-laal's fork. So it can be merged into the same PR. Instead of creating a new one that will require a rebase before...

AWS Lambda Stream yields empty buffer for images of larger size.

@lblo you got the solution. Thanks! It doesn't work out of the box for us, we needed to increase the memory of the lambda but also, limit the memory consumption...

Need help to find a fingerprint for 60+ ippen.media newssites

Just tried on f43.me (which use graby, which use these siteconfig) and the fingerprint I suggested in your PR is working great: ![image](https://user-images.githubusercontent.com/62333/159361362-4150a961-56e3-49d7-a197-417b13d25b21.png)

Need help to find a fingerprint for 60+ ippen.media newssites

Wait for @fivefilters answer

Need help to find a fingerprint for 60+ ippen.media newssites

That's an interesting suggestion. I think I still prefer the fingerprint because it avoid having many files. But the idea of having one real `test_url` per site is great. One...

theguardian: keep svg and figcaption tags, prune content

@fivefilters Have you tried it?

Create perspective-daily.de.txt

I can't see those _toggable_ extra-info after getting the article: ![image](https://user-images.githubusercontent.com/62333/79308640-3f2b9a00-7ef9-11ea-9347-2eadc7f0e721.png)

Create perspective-daily.de.txt

I get it. `span` are removed by readability by default (see https://github.com/j0k3r/php-readability/blob/master/src/Readability.php#L146) and can't then be _catched_ by a rule. Maybe that hardcoded rule should be removed/updated? I don't know