Mink
Mink copied to clipboard
getText() and whitespace
I have noticed the drivers (checked selenium2 and goutte drivers) the getText() method does not preserve whitespace. Which means unable to properly test output to pre elements for example.
$text = $session->getPage()->find('css', 'pre')->getText();
$this->assertEquals("Hello\nWorld!", $text);
Please list driver & mink version you're using. Try switching to dev-master version and check if problem persists there as well.
Also please post link to getText method implementation in mentioned drivers (better yet in all drivers).
Ha, in MinkSelenium2Driver we indeed have such code, that strips new lines: https://github.com/minkphp/MinkSelenium2Driver/blob/master/src/Selenium2Driver.php#L513
I think it was done back then to allow comparison of text in more relaxed form from the Behat steps (or WebAssert class), that are used in MinkExtension.
@stof it might be as well misplaced code. If it happens in all drivers, then I think it's better to keep line endings as-is in drivers and remove them in text, that is being compared in the WebAssert class instead.
For other drivers:
- MinkSelenium2Driver (stripping): https://github.com/Behat/MinkSelenium2Driver/blob/master/src/Behat/Mink/Driver/Selenium2Driver.php#L518
- MinkSeleniumDriver (stripping): https://github.com/Behat/MinkSeleniumDriver/blob/master/src/Behat/Mink/Driver/SeleniumDriver.php#L259
- SahiClient (used by MinkSahiDriver, not stripping): https://github.com/Behat/SahiClient/blob/master/src/Behat/SahiClient/Accessor/AbstractAccessor.php#L328
- MinkZombieDriver (stripping): https://github.com/Behat/MinkZombieDriver/blob/master/src/Behat/Mink/Driver/ZombieDriver.php#L375
- MinkBrowserKitDriver/MinkGoutteDriver (stripping): https://github.com/Behat/MinkBrowserKitDriver/blob/master/src/Behat/Mink/Driver/BrowserKitDriver.php#L374-L375
In all drivers, but Sahi we're stripping new lines, so it maybe Sahi strips them and other drivers strip manually to make it consistent. If Sahi doesn't strip them, then I don't know why we replace new lines with spaces.
@aik099 SahiClient is not stripping, but I think Sahi itself normalizes the whitespaces before returning the text (meaning it probably behaves well by not normalizing whitespace in a <pre> tag btw).
and we replace them because in HTML, whitespaces are rendered as a single space in the output (except in <pre> tags), so the text the user sees does not have newlines or multiple spaces in it
Then we shouldn't normalize whitespaces in PRE tags, however that might be a complicated task because I can:
- get PRE node and read text of it (easy, just don't touch whitespaces)
- get node, that has PRE node inside (harder, since we need to normalize whitespaces in all nodes but PRE
@aik099 and even worse for the thrid case: it can read the text in a node which is inside a <pre> tag
It maybe a bit hacky solution, but the $keepWhitespace parameter to getText method might be a quick and dirty solution. Writing some complicated logic to preserve whitespaces might not be worth it considering the % of cases, were we work with PRE in contrast to other tags.
Actually 1st and 3rd case are easily detectable with an xpath search for pre tag in this or any of parent nodes.
2nd case is more tricky, because it requires to manually parse returned text to determine PRE location. But the getText method result doesn't even have HTML tags in it. It's the getHTML method that has them.
As per http://www.w3.org/TR/html5/grouping-content.html#the-pre-element
Note: In the HTML syntax, a leading newline character immediately following the pre element start tag is stripped.
Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.
Personally I would be happy with a $keepWhitespace parameter that defaulted to false as to not break existing code. At least would be able to get text in its pre formatted format then.
Sadly, that it would only cover simple cases, where PRE element isn't buried somewhere deep in returned text.
So we're not handling PRE correctly, but otherwise all is fine.