selenium [🐛 Bug]: Return wrong value when using JavaScriptExecutor to execute decoding js function "atob"

What happened?

Running JavaScriptExecutor in Java code to excute the command "return atob('Rw0K')" in v3.2.0 will decode the string 'Rw0K' into 3 characters - ''G\r\n''. However, after upgrading to v4.13.0, it will decode this string into only 2 characters - which are 'G\n'. So basically, the '0K' string means to start a new line. However, in old version selenium, it will recognized as '\r\n' and for latest version, it will recognize as '\n'.

Since this incompatibility issue was found when we try to upgrade the selenium version from v3.2.0 to v4.13.0. Thus, it will cause some files' format doesn't match standard which leads to that file deprecated, like PNG file. Thus, I hope for the new selenium version, it can fix this incompatibility and when using JavaScriptExecutor to excute the command with 'atob' function, it can decode the encoded string which represents starting a new line back to "G\r\n", which is the same result as directly executing the command 'return atob('Rw0K')' web browser, like latest version chrome.

How can we reproduce the issue?

Using selenium v3.2.0, to create a JavaScriptExecutor to execute the JavaScript command "return atob('Rw0K')", you will get a decoded string "G\r\n", which includes 3 characters. 

However, if you using selenium v4.13.0 to create a JavaScriptExecutor and execute the same command above "return atob('Rw0K')", you will receive the decoded string "G\n", which only includes 2 characters.

Java Code Example:

//downloading different versions specified above

import org.openqa.selenium.JavascriptExecutor;
JavascriptExecutor js = (JavascriptExecutor) driver;
decoded_string=js.executeScript("return atob('Rw0K')");

// for v4.13.0, the decoded string length is expected as 2, and for v3.2.0, the length will be 3.

Relevant log output

no related log output

Operating System

Mac, Windows, Linux

Selenium version

Java 4.13.0

What are the browser(s) and version(s) where you see this issue?

Chrome 119.0.6045.159 (Official Build) (arm64)

What are the browser driver(s) and version(s) where you see this issue?

geckodriver-v0.33.0-macos-aarch64, chromedriver

Are you using Selenium Grid?

no

Nov 27 '23 16:11 Erix377

@Erix377, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

Nov 27 '23 16:11 github-actions[bot]

Hmm, as far as I can tell, this should have been this way since the beginning (2009)...

https://github.com/SeleniumHQ/selenium/commit/3a134104c711ca61fc942c8a0ae3eba1387fe55f#diff-f47940f927aeb054bf51538974a70205c603e484f5efe098e756bbe6535b8a6dR230-R236

//We normalise to \n because Java will translate this to \r\n //if this is suitable on our platform, and if we have \r\n, java will //turn this into \r\r\n, which would be Bad!

I mean, this code was added before Java 7 was a thing, so I don't know if it is still true, or something we should be concerned about...

Any Java experts chime in on this? @shs96c / @joerg1985 / @diemol

Nov 27 '23 17:11 titusfortner

@titusfortner I don't think the current Json de-/encoding does somehow modify the line endings. But changing this might result in other issues and should be checked agains the WebDriver spec. But there might be issues when the 0xFFFD char is read, this depends on the way the driver will encode chars.

@Erix377 I think transfering binary data as text, might result in effects like this. When the encoder detects faulty encoded chars and drops / replaces them.

        byte[] payload = java.util.HexFormat.of().parseHex("edbfbf");

        // this would be the return value of executeScript
        String asString = new String(payload, java.nio.charset.StandardCharsets.UTF_8); 
  
        byte[] asBytes = asString.getBytes(java.nio.charset.StandardCharsets.UTF_8);

        if (!Arrays.equals(payload, asBytes)) {
            throw new IllegalStateException("not equal");
        }

Nov 28 '23 09:11 joerg1985

@joerg1985 I understand that translation between binary data and string might cause the unequal issue, however if it does, the old version selenium will also show this issue. But this issue only happens in the latest version instead of old versions. And I tried to test in mac, windows, and linux os, all show the same issue for latest version instead of old version. That's my confusion. And when I use another decode function directly decode the encoded string rather than calling js executor to use atob js function to decode, it can perfectly decoded the expected data. Thus, I feel it must be some changes happened in latest selenium.

Nov 29 '23 15:11 Erix377

This conversion has always been there and/or was intended to always be there. I'm not sure how Selenium 3.2 would have skipped it, but if it did, it was a bug, and I don't think we have capacity to go back and figure out why.

Nov 29 '23 16:11 titusfortner

But the problem is that for the latest selenium, because it translate it to "\n" rather than "\r\n", it will cause the file generated based on the decoded data deprecated. Since, we found this issue when we try to generate a png file through writing the decoded data obtained from a canvas element. But the final generated PNG file is deprecated since the png format is not right due to the missing "/r". Thus, it is the new version which has the bug rather than the old version.

Nov 29 '23 16:11 Erix377

I will say that other languages aren't making this change, so if it isn't currently a problem for Java maybe we shouldn't be doing it, or at least not doing it universally

@shs96c I asked you about this a while back, can you chime in on the original reason for conversion?

Dec 31 '23 20:12 titusfortner

That's code from 2009, and that diff looks like it's something to do with the version of chromedriver we were working on. I'd have to go hunting for a deeper reason, so this is my working theory for now, but normalising line endings sounds like sending data back and forth between different platforms (macOS used \r as a line ending, linux \n, and Windows \r\n) Since we only ever send text data back and forth, it'd make sense to normalise in the driver to avoid users needing to trip over platform issues in their tests.

Jan 24 '24 09:01 shs96c

@Erix377 do you want to PR an update removing the change? It should be handled in the driver, and the other bindings don't do it, so it *should be safe. 😂

Jan 24 '24 19:01 titusfortner

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

Mar 08 '24 22:03 github-actions[bot]