Goutte
Goutte copied to clipboard
Goutte not using httpclient headers
Hi!
I'm trying to edit the User-Agent as described at The HttpClient Component Documentation but the crawler always use 'Symfony BrowserKit'. I'm doing something wrong or is a bug?
$client = new Client(HttpClient::create(['timeout' => 5000, 'headers' => ['User-Agent' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36']]));
Thanks for your help.
You can set the header in this manner:
$client->setHeader('User-Agent', $userAgent);
I'm trying to set the CURL options, hoping someone can help me. All the links for doing this are old.
Thanks for your answer. When I set headers in the way you suggested me I have this error:
Call to undefined method Goutte\Client::setHeader()
As the BrowserKit will override user-agent header during executing, you have to use setServerParameter() to put user agent header back.
use Goutte\Client;
use Symfony\Component\HttpClient\HttpClient;
$client = new Client(HttpClient::create(array(
'headers' => array(
'user-agent' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0', // will be forced using 'Symfony BrowserKit' in executing
'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language' => 'en-US,en;q=0.5',
'Referer' => 'http://yourtarget.url/',
'Upgrade-Insecure-Requests' => '1',
'Save-Data' => 'on',
'Pragma' => 'no-cache',
'Cache-Control' => 'no-cache',
),
)));
$client->setServerParameter('HTTP_USER_AGENT', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0');
Thank you for that important/critical piece of information, much appreciated!
@kiang, that's the correct answer, just tested, thanks for your great help!
Is this new? I have multiple scripts in which
use Goutte\Client;
$client = new Client();
$client->setHeader('user-agent','whatever');
has worked flawlessly.
Now suddenly—after starting a new project and using the most recent version of Goutte — it fails with
PHP Fatal error: Uncaught Error: Call to undefined method Goutte\Client::setHeader()
Yup. There it is: https://github.com/FriendsOfPHP/Goutte/commit/05f6994ec1d0d8368157de7fe45063e751857086
Yes, this answer is correct (https://github.com/FriendsOfPHP/Goutte/issues/401#issuecomment-591760247), I tested it on an updated version of Goutte.
I didn't read the comments thoroughly enough the first time through but This Comment is spot on. I found it after digging through the code with trial and error.
This:
$this->client->setServerParameter('HTTP_USER_AGENT', $userAgent);
Sets the user agent properly and solved my issue with a site.
This works in the current implementation:
$client = new Client([
'HTTP_USER_AGENT' => 'Mozilla/5.0 (X11; Linux i686; rv:78.0) Gecko/20100101 Firefox/78.0',
'HTTP_ACCEPT' => '*/*',
]);
Have a look at the constructor of Symfony\Component\BrowserKit\AbstractBrowser which calls the method setServerParameters(). The Goutte class Client is indirectly derived from Symfony\Component\BrowserKit\AbstractBrowser
This works in the current implementation:
$client = new Client([ 'HTTP_USER_AGENT' => 'Mozilla/5.0 (X11; Linux i686; rv:78.0) Gecko/20100101 Firefox/78.0', 'HTTP_ACCEPT' => '*/*', ]);
Have a look at the constructor of Symfony\Component\BrowserKit\AbstractBrowser which calls the method setServerParameters(). The Goutte class Client is indirectly derived from Symfony\Component\BrowserKit\AbstractBrowser
can you explain how to do this? when I copy the code it just shows errors and says must be of type HttpClientInterface