text() return an empty string
Hi,
when I use the crawler to filter the DOM and test the content of the title tag, I receive an empty string.
Tested with:
- Debian 9.4 64bits
- PHP 7.2.3
- PHP standalone server
- Symfony 4.0 basic skeleton
- Chrome Version 65.0.3325.181 (Build officiel) (64 bits)
<?php
// tests/PanthereTest.php
declare(strict_types=1);
namespace App\Tests;
use Panthere\PanthereTestCase;
/**
* Class PanthereTest
* @package App\Tests
*/
class PanthereTest extends PanthereTestCase
{
public function testSomething(): void
{
$client = static::createClient();
$crawler = $client->request('GET', static::$baseUri.'/');
$this->assertEquals('Welcome!', $crawler->filterXPath('//title')->html());
$this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());
$client = static::createPanthereClient();
$crawler = $client->request('GET', static::$baseUri.'/');
$this->assertEquals('<title>Welcome!</title>', $crawler->filterXPath('//title')->html());
$this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());
}
}
// composer.json
{
"type": "project",
"license": "proprietary",
"require": {
"php": "^7.1.3",
"ext-iconv": "*",
"symfony/console": "^4.0",
"symfony/flex": "^1.0",
"symfony/framework-bundle": "^4.0",
"symfony/lts": "^4@dev",
"symfony/yaml": "^4.0"
},
"require-dev": {
"dunglas/panthere": "^1.0@dev",
"symfony/dotenv": "^4.0",
"symfony/phpunit-bridge": "^4.0"
},
"config": {
"preferred-install": {
"*": "dist"
},
"sort-packages": true
},
"autoload": {
"psr-4": {
"App\\": "src/"
}
},
"autoload-dev": {
"psr-4": {
"App\\Tests\\": "tests/"
}
},
"replace": {
"symfony/polyfill-iconv": "*",
"symfony/polyfill-php71": "*",
"symfony/polyfill-php70": "*",
"symfony/polyfill-php56": "*"
},
"scripts": {
"auto-scripts": {
"cache:clear": "symfony-cmd",
"assets:install --symlink --relative %PUBLIC_DIR%": "symfony-cmd"
},
"post-install-cmd": [
"@auto-scripts"
],
"post-update-cmd": [
"@auto-scripts"
]
},
"conflict": {
"symfony/symfony": "*"
},
"extra": {
"symfony": {
"id": "01C9W9DMK0BWP564TKPSF2P5F0",
"allow-contrib": false
}
}
}
Result:
PHPUnit 6.5.7 by Seb
astian Bergmann and contributors.
Testing Project Test Suite
F 1 / 1 (100%)
Time: 4.76 seconds, Memory: 8.00MB
There was 1 failure:
1) App\Tests\PanthereTest::testSomething
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Welcome!'
+''
tests/PanthereTest.php:24
FAILURES!
Tests: 1, Assertions: 4, Failures: 1.
Same thing with Windows 10 / PHP 7.2.4 / Wampserver.
But no problem with a tag in body - like p for example. It seems occured only with head tags.
Probably a weird behavior with Chrome...
Ok, this is because WebDriver returns only the displayed text, and by definition anything in <head> is hidden. Here is a workaround: http://grokbase.com/t/gg/webdriver/155wx8zwjv/how-to-get-the-content-tags-that-reside-in-head-head-of-a-webpage
Maybe can we change this behavior, and return innerHtml if the tag is in <head>?
Seems a good option.
Imo using a different behavior for the
section is too specific. I think it would be a better solution to use innerHtml as a general fallback for text(). Furthermore if a dev does not want this default behavior activated, he/she gets an option to deactivate this behavior.
Or we can add a flag to getText, like getText(bool $includeHidden) that will be false by default.
Or maybe 2 methods?
IMO the current behaviour is the expected behaviour and shouldn't be changed. As long as the developer is able to get the innerHtml, it all seem good to me.
@dinamic: I agree with you now. Maybe just add a notice in doc?
A notice in the docs would be great. Do you want to work on this?
This may be expected behavior as far as webdriver is concerned, but it is inconsistent with goutte... I guess the question is which takes priority?
If you would like to keep it consistent, maybe we could try something like $element->getAttribute('textContent') instead of $element->getText()?
And then if you wanted to expose the webdriver behavior add a secondary method like $crawler->visibleText()?
@thomasage test case above also exposes a similar issue with the html method - panther is getting the outerHTML attribute but goutte gets what would be the equivalent of innerHTML.
I have read all the docs and tried everything I can think of but I cannot get the value of an element that is CSS display:none.
In fact, the $node->html() method always returns an empty string, even on non hidden content...
And the $node->outerHtml() return an error: The "getNode" method cannot be used in WebDriver mode. Use "getElement" instead
Confused...
$node->filter('.match-info')->outerHtml(); // ERROR
$node->filter('.match-info')->html(); // EMPTY
$node->filter('.match-info')->html(); // NOT EMPTY BUT HIDDEN TEXT NOT THERE
$scores = $node->filter('.match-info')->each(function (Crawler $node, $i) {
$node->html(); // EMPTY
$node->text(); // NOT EMPTY BUT HIDDEN TEXT NOT THERE
});
You can crawl hidden elements by Symfony DomCrawler
Just upload html from Panther to Symfony DomCrawler
Simple example
$browser = Client::createChromeClient();
$browser->request('GET', 'https://www.webpage.com');
$htmlData = $browser->getCrawler()->html();
$domCrawler = new Crawler($htmlSongData);
$carData = $domCrawler->filter('table')->eq(1)->filter('tr')->each(
function (Crawler $node) {
$html = $node->outerHtml();
if ($node->filter('td')->count() > 0) {
$rowTitle = $node->filter('td')->eq(0)->text();
$rowValue = $node->filter('td')->eq(1)->text();
return [
'rowTitle' => $rowTitle,
'rowValue' => $rowValue,
];
}
});