Mink icon indicating copy to clipboard operation
Mink copied to clipboard

getText() and whitespace

Open grom358 opened this issue 11 years ago • 12 comments

I have noticed the drivers (checked selenium2 and goutte drivers) the getText() method does not preserve whitespace. Which means unable to properly test output to pre elements for example.

$text = $session->getPage()->find('css', 'pre')->getText();
$this->assertEquals("Hello\nWorld!", $text);

grom358 avatar Jul 10 '14 07:07 grom358

Please list driver & mink version you're using. Try switching to dev-master version and check if problem persists there as well.

Also please post link to getText method implementation in mentioned drivers (better yet in all drivers).

aik099 avatar Jul 10 '14 07:07 aik099

Ha, in MinkSelenium2Driver we indeed have such code, that strips new lines: https://github.com/minkphp/MinkSelenium2Driver/blob/master/src/Selenium2Driver.php#L513

I think it was done back then to allow comparison of text in more relaxed form from the Behat steps (or WebAssert class), that are used in MinkExtension.

@stof it might be as well misplaced code. If it happens in all drivers, then I think it's better to keep line endings as-is in drivers and remove them in text, that is being compared in the WebAssert class instead.

aik099 avatar Jul 10 '14 08:07 aik099

For other drivers:

  • MinkSelenium2Driver (stripping): https://github.com/Behat/MinkSelenium2Driver/blob/master/src/Behat/Mink/Driver/Selenium2Driver.php#L518
  • MinkSeleniumDriver (stripping): https://github.com/Behat/MinkSeleniumDriver/blob/master/src/Behat/Mink/Driver/SeleniumDriver.php#L259
  • SahiClient (used by MinkSahiDriver, not stripping): https://github.com/Behat/SahiClient/blob/master/src/Behat/SahiClient/Accessor/AbstractAccessor.php#L328
  • MinkZombieDriver (stripping): https://github.com/Behat/MinkZombieDriver/blob/master/src/Behat/Mink/Driver/ZombieDriver.php#L375
  • MinkBrowserKitDriver/MinkGoutteDriver (stripping): https://github.com/Behat/MinkBrowserKitDriver/blob/master/src/Behat/Mink/Driver/BrowserKitDriver.php#L374-L375

In all drivers, but Sahi we're stripping new lines, so it maybe Sahi strips them and other drivers strip manually to make it consistent. If Sahi doesn't strip them, then I don't know why we replace new lines with spaces.

aik099 avatar Jul 10 '14 08:07 aik099

@aik099 SahiClient is not stripping, but I think Sahi itself normalizes the whitespaces before returning the text (meaning it probably behaves well by not normalizing whitespace in a <pre> tag btw).

and we replace them because in HTML, whitespaces are rendered as a single space in the output (except in <pre> tags), so the text the user sees does not have newlines or multiple spaces in it

stof avatar Jul 10 '14 08:07 stof

Then we shouldn't normalize whitespaces in PRE tags, however that might be a complicated task because I can:

  1. get PRE node and read text of it (easy, just don't touch whitespaces)
  2. get node, that has PRE node inside (harder, since we need to normalize whitespaces in all nodes but PRE

aik099 avatar Jul 10 '14 08:07 aik099

@aik099 and even worse for the thrid case: it can read the text in a node which is inside a <pre> tag

stof avatar Jul 10 '14 08:07 stof

It maybe a bit hacky solution, but the $keepWhitespace parameter to getText method might be a quick and dirty solution. Writing some complicated logic to preserve whitespaces might not be worth it considering the % of cases, were we work with PRE in contrast to other tags.

aik099 avatar Jul 10 '14 09:07 aik099

Actually 1st and 3rd case are easily detectable with an xpath search for pre tag in this or any of parent nodes.

2nd case is more tricky, because it requires to manually parse returned text to determine PRE location. But the getText method result doesn't even have HTML tags in it. It's the getHTML method that has them.

aik099 avatar Jul 10 '14 13:07 aik099

As per http://www.w3.org/TR/html5/grouping-content.html#the-pre-element

Note: In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped.

grom358 avatar Jul 21 '14 05:07 grom358

Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.

grom358 avatar Jul 21 '14 05:07 grom358

Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.

Sadly, that it would only cover simple cases, where PRE element isn't buried somewhere deep in returned text.

aik099 avatar Jul 21 '14 07:07 aik099

So we're not handling PRE correctly, but otherwise all is fine.

aik099 avatar Jun 27 '15 10:06 aik099