MinkBrowserKitDriver icon indicating copy to clipboard operation
MinkBrowserKitDriver copied to clipboard

getText() returns text other drivers does not

Open alexpott opened this issue 3 years ago • 4 comments

\Behat\Mink\Driver\BrowserKitDriver::getText() will return text in the head section and also any json on the page that's contained in a script tag in the HTML body. \Behat\Mink\Driver\Selenium2Driver::getText(), for example, will not return text from the head section or script tags in the body section. Given the Mink documentation states:

getText() will strip tags and unprinted characters out of the response, including newlines. So it’ll basically return the text that the user sees on the page.

I'm not sure if this is a Symfony\DomCrawler issue or not.

See for a discussion of the affects of this - https://www.drupal.org/project/drupal/issues/3175718

alexpott avatar Oct 08 '20 21:10 alexpott

DomCrawler is simply using php's DOMNode: https://www.php.net/manual/en/class.domnode.php#domnode.props.textcontent which is implementing the W3c spec: https://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html#core-ID-1312295772

jonathanjfshaw avatar Oct 09 '20 10:10 jonathanjfshaw

@jonathanjfshaw yep and it's returning what document.body.textContent in the browser console does. The point is that this is not what \Behat\Mink\Driver\Selenium2Driver::getText() returns and it is returning stuff that is not visible.

alexpott avatar Oct 09 '20 17:10 alexpott

I see no issue here.

The Selenium driver is talking to a real browser and can ask to return only text visible to a user. The BrowserKit being a headless driver only looking at HTML tags and parsing them to its knowledge. This way stripping all HTML tags will leave their content in place resulting in the effect you're getting.

@alexpott , I'm recommending to use the getText method on the BODY NodeElement (PHP class in Mink) of the document, not the whole document. This way you won't get any extra stuff (at least I hope so).

Code below (maybe not working) is how I'll be getting the contents of a document.

$body_text = $session->getPage()->find('xpath', '//body')->getText();

aik099 avatar Oct 22 '20 18:10 aik099

@aik099 body can contain script tags. Adding script tags just before closing the body tag is often advocated for performance reasons.

alexpott avatar May 22 '21 21:05 alexpott