Error 500 when fetching some websites
In a new deployment of wallabag-docker-service on a raspberry pi I'm getting an error 500 when trying to fetch some sites:
[2025-09-06T10:18:46.780809+00:00] graby.INFO: Cached site config with key: viajeroscallejeros.com.merged {"key":"viajeroscallejeros.com.merged"} []
[2025-09-06T10:18:46.780817+00:00] graby.INFO: Fetching url: https://www.viajeroscallejeros.com/que-ver-en-el-hierro/ {"url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/"} []
[2025-09-06T10:18:46.780842+00:00] graby.INFO: Trying using method "get" on url "https://www.viajeroscallejeros.com/que-ver-en-el-hierro/" {"method":"get","url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/"} []
[2025-09-06T10:18:46.780863+00:00] graby.INFO: Use default user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2" for url "https://www.viajeroscallejeros.com/que-ver-en-el-hierro/" {"user-agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.92 Safari/535.2","url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/"} []
[2025-09-06T10:18:46.780873+00:00] graby.INFO: Use default referer "http://www.google.co.uk/url?sa=t&source=web&cd=1" for url "https://www.viajeroscallejeros.com/que-ver-en-el-hierro/" {"referer":"http://www.google.co.uk/url?sa=t&source=web&cd=1","url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/"} []
[2025-09-06T10:18:46.793277+00:00] httplug.INFO: Sending request: GET https://www.viajeroscallejeros.com/que-ver-en-el-hierro/ 1.1 {"uid":"68bc0a86c1a893.72020535"} []
[2025-09-06T10:18:46.950777+00:00] httplug.ERROR: Error: cURL error 52: Empty reply from server when sending request: GET https://www.viajeroscallejeros.com/que-ver-en-el-hierro/ 1.1 {"exception":"[object] (Http\\Client\\Exception\\NetworkException(code: 0): cURL error 52: Empty reply from server at /var/www/wallabag/vendor/php-http/guzzle5-adapter/src/Client.php:116)\n[previous exception] [object] (GuzzleHttp\\Exception\\ConnectException(code: 0): cURL error 52: Empty reply from server at /var/www/wallabag/vendor/guzzlehttp/guzzle/src/Exception/RequestException.php:49)\n[previous exception] [object] (GuzzleHttp\\Ring\\Exception\\ConnectException(code: 0): cURL error 52: Empty reply from server at /var/www/wallabag/vendor/guzzlehttp/ringphp/src/Client/CurlFactory.php:126)","milliseconds":158,"uid":"68bc0a86c1a893.72020535"} []
[2025-09-06T10:18:46.952336+00:00] graby.WARNING: Request throw exception (with no response): cURL error 52: Empty reply from server {"error_message":"cURL error 52: Empty reply from server"} []
[2025-09-06T10:18:46.952408+00:00] graby.INFO: Data fetched: array{"effective_url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/","body":"","headers":[],"status":500} {"data":{"effective_url":"https://www.viajeroscallejeros.com/que-ver-en-el-hierro/","body":"","headers":[],"status":500}} []
[2025-09-06T10:18:46.952529+00:00] graby.DEBUG: Fetched HTML {"html":""} []
[2025-09-06T10:18:46.952591+00:00] graby.DEBUG: HTML after regex empty nodes stripping {"html":""} []
[2025-09-06T10:18:46.952645+00:00] graby.INFO: Looking for site config files to see if single page link exists [] []
[2025-09-06T10:18:46.952707+00:00] graby.INFO: Returning cached and merged site config for viajeroscallejeros.com {"host":"viajeroscallejeros.com"} []
[2025-09-06T10:18:46.952761+00:00] graby.INFO: No "single_page_link" config found [] []
[2025-09-06T10:18:46.952808+00:00] graby.INFO: Attempting to extract content [] []
[2025-09-06T10:18:46.952869+00:00] graby.INFO: Returning cached and merged site config for viajeroscallejeros.com {"host":"viajeroscallejeros.com"} []
[2025-09-06T10:18:46.952932+00:00] graby.DEBUG: Actual site config {"siteConfig":{"Graby\\SiteConfig\\SiteConfig":{"title":["//meta[@property=\"og:title\"]/@content"],"body":[],"author":[],"date":["//meta[@property=\"article:published_time\"]/@content"],"strip":["//*[contains(@class, 'google-dfp-ad-wrapper')]","//iframe/@srcdoc"],"src_lazy_load_attr":null,"strip_id_or_class":["sharedaddy","i-amphtml-replaced-content"],"strip_image_src":["doubleclick.net"],"native_ad_clue":[],"http_header":[],"tidy":null,"autodetect_on_failure":null,"prune":null,"test_url":[],"if_page_contains":[],"single_page_link":[],"next_page_link":[],"parser":null,"find_string":["<amp-img","</amp-img>"],"replace_string":["<img","<!-- nothing -->"],"cache_key":null,"requires_login":false,"not_logged_in_xpath":null,"login_uri":null,"login_username_field":null,"login_password_field":null,"login_extra_fields":[],"skip_json_ld":false,"wrap_in":[]}}} []
[2025-09-06T10:18:46.953061+00:00] graby.INFO: Strings replaced: 0 (find_string and/or replace_string) {"count":0} []
[2025-09-06T10:18:46.953115+00:00] graby.DEBUG: HTML after site config strings replacements {"html":""} []
[2025-09-06T10:18:46.953169+00:00] graby.INFO: Attempting to parse HTML with libxml {"parser":"libxml"} []
[2025-09-06T10:18:46.953840+00:00] graby.INFO: Body size after Readability: 85 {"length":85} []
[2025-09-06T10:18:46.953940+00:00] graby.DEBUG: Body after Readability {"dom_saveXML":"<html xmlns=\"http://www.w3.org/1999/xhtml\"><head><title/></head><body>\n</body></html>"} []
[2025-09-06T10:18:46.954089+00:00] graby.INFO: Trying //meta[@property="og:title"]/@content for title {"pattern":"//meta[@property=\"og:title\"]/@content"} []
[2025-09-06T10:18:46.954197+00:00] graby.INFO: Trying //meta[@property="article:published_time"]/@content for date {"pattern":"//meta[@property=\"article:published_time\"]/@content"} []
[2025-09-06T10:18:46.954295+00:00] graby.INFO: Trying //html[@lang]/@lang for language {"pattern":"//html[@lang]/@lang"} []
[2025-09-06T10:18:46.954367+00:00] graby.INFO: Trying //meta[@name="DC.language"]/@content for language {"pattern":"//meta[@name=\"DC.language\"]/@content"} []
[2025-09-06T10:18:46.954439+00:00] graby.INFO: Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element {"pattern":"//*[contains(@class, 'google-dfp-ad-wrapper')]"} []
[2025-09-06T10:18:46.954533+00:00] graby.INFO: Trying //iframe/@srcdoc to strip element {"pattern":"//iframe/@srcdoc"} []
[2025-09-06T10:18:46.954599+00:00] graby.INFO: Trying sharedaddy to strip element {"string":"sharedaddy"} []
[2025-09-06T10:18:46.954717+00:00] graby.INFO: Trying i-amphtml-replaced-content to strip element {"string":"i-amphtml-replaced-content"} []
[2025-09-06T10:18:46.954944+00:00] graby.DEBUG: DOM after site config stripping {"dom_saveXML":"<html xmlns=\"http://www.w3.org/1999/xhtml\"><head><title/></head><body>\n</body></html>"} []
[2025-09-06T10:18:46.955167+00:00] graby.INFO: Using Readability [] []
[2025-09-06T10:18:46.957103+00:00] graby.INFO: Date is bad (wrong year): {"date":""} []
[2025-09-06T10:18:46.957283+00:00] graby.INFO: Trying again without tidy [] []
[2025-09-06T10:18:46.957358+00:00] graby.DEBUG: Actual site config {"siteConfig":{"Graby\\SiteConfig\\SiteConfig":{"title":["//meta[@property=\"og:title\"]/@content"],"body":[],"author":[],"date":["//meta[@property=\"article:published_time\"]/@content"],"strip":["//*[contains(@class, 'google-dfp-ad-wrapper')]","//iframe/@srcdoc"],"src_lazy_load_attr":null,"strip_id_or_class":["sharedaddy","i-amphtml-replaced-content"],"strip_image_src":["doubleclick.net"],"native_ad_clue":[],"http_header":[],"tidy":null,"autodetect_on_failure":null,"prune":null,"test_url":[],"if_page_contains":[],"single_page_link":[],"next_page_link":[],"parser":null,"find_string":["<amp-img","</amp-img>"],"replace_string":["<img","<!-- nothing -->"],"cache_key":null,"requires_login":false,"not_logged_in_xpath":null,"login_uri":null,"login_username_field":null,"login_password_field":null,"login_extra_fields":[],"skip_json_ld":false,"wrap_in":[]}}} []
[2025-09-06T10:18:46.957488+00:00] graby.INFO: Strings replaced: 0 (find_string and/or replace_string) {"count":0} []
[2025-09-06T10:18:46.957544+00:00] graby.DEBUG: HTML after site config strings replacements {"html":""} []
[2025-09-06T10:18:46.957599+00:00] graby.INFO: Attempting to parse HTML with libxml {"parser":"libxml"} []
[2025-09-06T10:18:46.957763+00:00] graby.INFO: Body size after Readability: 7 {"length":7} []
[2025-09-06T10:18:46.957833+00:00] graby.DEBUG: Body after Readability {"dom_saveXML":"<html/>"} []
[2025-09-06T10:18:46.957938+00:00] graby.INFO: Trying //meta[@property="og:title"]/@content for title {"pattern":"//meta[@property=\"og:title\"]/@content"} []
[2025-09-06T10:18:46.958025+00:00] graby.INFO: Trying //meta[@property="article:published_time"]/@content for date {"pattern":"//meta[@property=\"article:published_time\"]/@content"} []
[2025-09-06T10:18:46.958097+00:00] graby.INFO: Trying //html[@lang]/@lang for language {"pattern":"//html[@lang]/@lang"} []
[2025-09-06T10:18:46.958165+00:00] graby.INFO: Trying //meta[@name="DC.language"]/@content for language {"pattern":"//meta[@name=\"DC.language\"]/@content"} []
[2025-09-06T10:18:46.958233+00:00] graby.INFO: Trying //*[contains(@class, 'google-dfp-ad-wrapper')] to strip element {"pattern":"//*[contains(@class, 'google-dfp-ad-wrapper')]"} []
[2025-09-06T10:18:46.958312+00:00] graby.INFO: Trying //iframe/@srcdoc to strip element {"pattern":"//iframe/@srcdoc"} []
[2025-09-06T10:18:46.958377+00:00] graby.INFO: Trying sharedaddy to strip element {"string":"sharedaddy"} []
[2025-09-06T10:18:46.958472+00:00] graby.INFO: Trying i-amphtml-replaced-content to strip element {"string":"i-amphtml-replaced-content"} []
[2025-09-06T10:18:46.958632+00:00] graby.DEBUG: DOM after site config stripping {"dom_saveXML":"<html/>"} []
[2025-09-06T10:18:46.958831+00:00] graby.INFO: Using Readability [] []
[2025-09-06T10:18:46.960341+00:00] graby.INFO: Date is bad (wrong year): {"date":""} []
[2025-09-06T10:18:46.960467+00:00] graby.INFO: Success ? {"is_success":false} []
[2025-09-06T10:18:46.960542+00:00] graby.INFO: Extract failed [] []
[2025-09-06T10:18:46.960815+00:00] app.DEBUG: Extracting images from content to provide a default preview picture [] []
[2025-09-06T10:18:46.961402+00:00] app.DEBUG: 0 pictures found [] []
[2025-09-06T10:18:46.963695+00:00] security.DEBUG: Stored the security token in the session. {"key":"_security_secured_area"} []
If I make a direct curl https://www.viajeroscallejeros.com/que-ver-en-el-hierro/ from inside the container, it works perfectly.
I can fetch some addresses without problems, but some others I cannot fetch and I get an that error 500
after some investigation, looks like the default agent is preventing a lot of websites to be retrieved, if I change it to: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 it works fine
After changing the default agent in HttpClient.php, now most sites work fine. Perhaps this parameter can be set as part of an environment variable, or any other configuration, because if I put it into global.txt, all custom agents for specific sites will be ignored
Hi @togarha,
You will have this feature in the upcoming v2.7 version, see wallabag_user_agent in https://doc.wallabag.org/admin/parameters/#other-wallabag-options