Yann Defretin

Results 130 comments of Yann Defretin

@jcupitt I already have 8.10.6 installed. When reinstalling it, I got this message from Homebrew though: > Warning: vips dependency gcc was built with a different C++ standard library (libstdc++...

Any news? I'm interested in tracking each task (part of a batch) I'm receiving. Right now it's kind of confusing because while the buffer gets populated and before it gets...

Hello, URL of testing: https://orientxxi.info/fa Trafilatura version : 1.6.2 ```python3 import trafilatura downloaded = trafilatura.fetch_url("https://orientxxi.info/fa") trafilatura.extract(downloaded, output_format="json") ``` I am wondering why the title is not the one provided in...

@adbar Thanks for your answer on my previous case. I have another one! Doing something like: ```python3 trafi_extraction = trafilatura.extract( response.decode(errors='ignore'), output_format='json', include_images=False, date_extraction_params={ 'extensive_search': True, 'original_date': True, 'min_date': EARLIEST_VALID_DATE,...

@hqtang33 Were you able to find a solution? I tried to include your changes proposed here and also your fork of the stealth plugin but unfortunately, even the "simple" removal...

Any news on this? It's a must have feature

@amazing-jay Funny you post here today, I was looking at this issue again earlier this morning, 5 years after my initial post. I found this fork that adds the "auto-correct...

@elacuesta We were able to narrow down the problem to two settings. First, using the [new headless mode](https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html) of Chrome, like this: ```python3 PLAYWRIGHT_LAUNCH_OPTIONS = { 'args': [ '--headless=new', ],...

I just saw the update on your Playwright issue: do you think there is a chance you could integrate in your plug-in one of the workarounds posted to handle this?...

Thanks for your help. For now, we try to detect the PDF viewer code when using Chromium and we redirect the download to a non-Playwright spider. We basically compare the...