panther icon indicating copy to clipboard operation
panther copied to clipboard

text() return an empty string

Open thomasage opened this issue 7 years ago • 13 comments

Hi,

when I use the crawler to filter the DOM and test the content of the title tag, I receive an empty string.

Tested with:

  • Debian 9.4 64bits
  • PHP 7.2.3
  • PHP standalone server
  • Symfony 4.0 basic skeleton
  • Chrome Version 65.0.3325.181 (Build officiel) (64 bits)
<?php
// tests/PanthereTest.php
declare(strict_types=1);

namespace App\Tests;

use Panthere\PanthereTestCase;

/**
 * Class PanthereTest
 * @package App\Tests
 */
class PanthereTest extends PanthereTestCase
{
    public function testSomething(): void
    {
        $client = static::createClient();
        $crawler = $client->request('GET', static::$baseUri.'/');
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->html());
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());

        $client = static::createPanthereClient();
        $crawler = $client->request('GET', static::$baseUri.'/');
        $this->assertEquals('<title>Welcome!</title>', $crawler->filterXPath('//title')->html());
        $this->assertEquals('Welcome!', $crawler->filterXPath('//title')->text());
    }
}
// composer.json
{
    "type": "project",
    "license": "proprietary",
    "require": {
        "php": "^7.1.3",
        "ext-iconv": "*",
        "symfony/console": "^4.0",
        "symfony/flex": "^1.0",
        "symfony/framework-bundle": "^4.0",
        "symfony/lts": "^4@dev",
        "symfony/yaml": "^4.0"
    },
    "require-dev": {
        "dunglas/panthere": "^1.0@dev",
        "symfony/dotenv": "^4.0",
        "symfony/phpunit-bridge": "^4.0"
    },
    "config": {
        "preferred-install": {
            "*": "dist"
        },
        "sort-packages": true
    },
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        }
    },
    "autoload-dev": {
        "psr-4": {
            "App\\Tests\\": "tests/"
        }
    },
    "replace": {
        "symfony/polyfill-iconv": "*",
        "symfony/polyfill-php71": "*",
        "symfony/polyfill-php70": "*",
        "symfony/polyfill-php56": "*"
    },
    "scripts": {
        "auto-scripts": {
            "cache:clear": "symfony-cmd",
            "assets:install --symlink --relative %PUBLIC_DIR%": "symfony-cmd"
        },
        "post-install-cmd": [
            "@auto-scripts"
        ],
        "post-update-cmd": [
            "@auto-scripts"
        ]
    },
    "conflict": {
        "symfony/symfony": "*"
    },
    "extra": {
        "symfony": {
            "id": "01C9W9DMK0BWP564TKPSF2P5F0",
            "allow-contrib": false
        }
    }
}

Result:

PHPUnit 6.5.7 by Seb
astian Bergmann and contributors.

Testing Project Test Suite
F                                                                   1 / 1 (100%)

Time: 4.76 seconds, Memory: 8.00MB

There was 1 failure:

1) App\Tests\PanthereTest::testSomething
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Welcome!'
+''

tests/PanthereTest.php:24

FAILURES!
Tests: 1, Assertions: 4, Failures: 1.

thomasage avatar Mar 30 '18 21:03 thomasage

Same thing with Windows 10 / PHP 7.2.4 / Wampserver.

But no problem with a tag in body - like p for example. It seems occured only with head tags.

thomasage avatar Apr 03 '18 06:04 thomasage

Probably a weird behavior with Chrome...

dunglas avatar Apr 04 '18 14:04 dunglas

Ok, this is because WebDriver returns only the displayed text, and by definition anything in <head> is hidden. Here is a workaround: http://grokbase.com/t/gg/webdriver/155wx8zwjv/how-to-get-the-content-tags-that-reside-in-head-head-of-a-webpage

Maybe can we change this behavior, and return innerHtml if the tag is in <head>?

dunglas avatar Apr 04 '18 14:04 dunglas

Seems a good option.

thomasage avatar Apr 04 '18 14:04 thomasage

Imo using a different behavior for the

section is too specific. I think it would be a better solution to use innerHtml as a general fallback for text(). Furthermore if a dev does not want this default behavior activated, he/she gets an option to deactivate this behavior.

LegendOfGIT avatar Apr 06 '18 06:04 LegendOfGIT

Or we can add a flag to getText, like getText(bool $includeHidden) that will be false by default.

dunglas avatar Apr 06 '18 07:04 dunglas

Or maybe 2 methods?

thomasage avatar Apr 16 '18 19:04 thomasage

IMO the current behaviour is the expected behaviour and shouldn't be changed. As long as the developer is able to get the innerHtml, it all seem good to me.

dinamic avatar Jul 19 '18 14:07 dinamic

@dinamic: I agree with you now. Maybe just add a notice in doc?

thomasage avatar Jul 20 '18 07:07 thomasage

A notice in the docs would be great. Do you want to work on this?

dunglas avatar Sep 13 '18 08:09 dunglas

This may be expected behavior as far as webdriver is concerned, but it is inconsistent with goutte... I guess the question is which takes priority?

If you would like to keep it consistent, maybe we could try something like $element->getAttribute('textContent') instead of $element->getText()?

And then if you wanted to expose the webdriver behavior add a secondary method like $crawler->visibleText()?

@thomasage test case above also exposes a similar issue with the html method - panther is getting the outerHTML attribute but goutte gets what would be the equivalent of innerHTML.

ssnepenthe avatar Feb 11 '19 18:02 ssnepenthe

I have read all the docs and tried everything I can think of but I cannot get the value of an element that is CSS display:none.

In fact, the $node->html() method always returns an empty string, even on non hidden content...

And the $node->outerHtml() return an error: The "getNode" method cannot be used in WebDriver mode. Use "getElement" instead

Confused...

$node->filter('.match-info')->outerHtml();  // ERROR
$node->filter('.match-info')->html();  // EMPTY
$node->filter('.match-info')->html();  // NOT EMPTY BUT HIDDEN TEXT NOT THERE
$scores = $node->filter('.match-info')->each(function (Crawler $node, $i) {
    $node->html();  // EMPTY
    $node->text();  // NOT EMPTY BUT HIDDEN TEXT NOT THERE
});

BB-000 avatar Jan 28 '22 11:01 BB-000

You can crawl hidden elements by Symfony DomCrawler

Just upload html from Panther to Symfony DomCrawler

Simple example

        $browser = Client::createChromeClient();
        $browser->request('GET', 'https://www.webpage.com');

         $htmlData       = $browser->getCrawler()->html();
         $domCrawler  = new Crawler($htmlSongData);
         $carData         = $domCrawler->filter('table')->eq(1)->filter('tr')->each(
            function (Crawler $node) {
                $html = $node->outerHtml();
                if ($node->filter('td')->count() > 0) {
                    $rowTitle = $node->filter('td')->eq(0)->text();
                    $rowValue = $node->filter('td')->eq(1)->text();

                    return [
                        'rowTitle' => $rowTitle,
                        'rowValue' => $rowValue,
                    ];
                }
            });

Radio-Skonto avatar Mar 22 '24 15:03 Radio-Skonto