selenium icon indicating copy to clipboard operation
selenium copied to clipboard

[🐛 Bug]: Run out of memory downloading large files from Selenium Grid via downloadFile

Open theswordsmahin opened this issue 1 year ago • 9 comments

What happened?

When using the built-in methods for downloading files from a RemoteWebDriver if the file is large enough we run into out of memory exceptions.

Maybe should open a separate feature request instead, but it would be nice if instead of making files accessible via their contents being reported in a json file, if there was a separate endpoint to download the file directly instead?

How can we reproduce the issue?

public static void main( String[] args ) throws MalformedURLException {
	String hubUrl = "http://selenium-hub:4444/wd/hub";
	DesiredCapabilities capabilities = new DesiredCapabilities();
	capabilities.setBrowserName( "chrome" );
	capabilities.setCapability( "se:downloadsEnabled", true );
	RemoteWebDriver driver = new RemoteWebDriver( new URL( hubUrl ), capabilities, false );
	try {
		driver.manage().window().maximize();
		driver.get( "https://testfileorg.jio.business/Colttaine.zip" );
		// // Waiting for the file to be remotely downloaded
		String fileName = "Colttaine.zip";
		while ( !driver.getDownloadableFiles().contains( fileName ) ) {
			TimeUnit.SECONDS.sleep( 5 );
		}
		driver.downloadFile( fileName, Paths.get( "" ) );
	} catch ( Exception e ) {
		e.printStackTrace();
	} finally {
		driver.quit();
	}
}

Relevant log output

org.openqa.selenium.WebDriverException: Java heap space
Build info: version: '4.16.1', revision: '9b4c83354e'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.15.133.1-microsoft-standard-WSL2', java.version: '17.0.10'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [05328f4bc9dac630fbc35c9ce2207299, downloadFile {name=Colttaine.zip}]
Capabilities {acceptInsecureCerts: false, browserName: chrome, browserVersion: 120.0.6099.109, chrome: {chromedriverVersion: 120.0.6099.109 (3419140ab66..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:45475}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://172.18.0.5:4444/sessio..., se:cdpVersion: 120.0.6099.109, se:downloadsEnabled: true, se:forwardCdp: ws://172.18.0.6:4444/sessio..., se:vnc: ws://172.18.0.5:4444/sessio..., se:vncEnabled: true, se:vncLocalAddress: ws://172.18.0.6:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
Session ID: 05328f4bc9dac630fbc35c9ce2207299
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
        at org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.createException(W3CHttpResponseCodec.java:200)
        at org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:133)
        at org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:52)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:191)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:523)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:596)
        at org.openqa.selenium.remote.RemoteWebDriver.downloadFile(RemoteWebDriver.java:659)

Operating System

Any

Selenium version

4.16.1

What are the browser(s) and version(s) where you see this issue?

"selenium/standalone-chrome:4.16.1", '{"browserName": "chrome", "platformName": "linux", "se:downloadsEnabled": true, "se:recordVideo": true }'

What are the browser driver(s) and version(s) where you see this issue?

"selenium/standalone-chrome:4.16.1", '{"browserName": "chrome", "platformName": "linux", "se:downloadsEnabled": true, "se:recordVideo": true }'

Are you using Selenium Grid?

4.16.1

theswordsmahin avatar Feb 12 '24 23:02 theswordsmahin

@theswordsmahin, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] avatar Feb 12 '24 23:02 github-actions[bot]

What is "large enough"? Also, we implemented this feature to unblock folks who wanted to test flows where files get generated through a set of combined actions on a website, and ideally the size of those files is relatively small. Given that this is thought for end user application testing, we would not expect that a end user needs to download a huge file.

Maybe should open a separate feature request instead, but it would be nice if instead of making files accessible via their contents being reported in a json file, if there was a separate endpoint to download the file directly instead?

This is not possible because the browser needs to download the file first, and then Grid can access it.

diemol avatar Feb 13 '24 13:02 diemol

The example I posted was using a 5GB file and successfully reproduced the issue (I just found a test file with reasonable download speed for the purpose of the example), but I expect it would vary with the heap space allocated to the executing jvm process.

Given that this is thought for end user application testing, we would not expect that a end user needs to download a huge file.

My use case is for testing an end user application: downloading an OVA image and running a checksum verification on the file.

This is not possible because the browser needs to download the file first, and then Grid can access it.

Just to clarify, the browser is able to download the file fine, the problem occurs when when I call driver.downloadFile( fileName, Paths.get( "" ) ); on the RemoteWebdriver object due to, I'm guessing, reading the Base64 file contents into memory from the JSON response.

I can of course workaround this limitation by just using Selenium to verify the download button works and verify that the file is eventually downloaded, and use a REST call if I actually need the full file. If this cant be fixed, would it be possible to display additional information on the /se/files endpoint, like file size or md5 values? I could see those being valuable for both large and small files for quicker verification of attributes in testing.

Thanks for looking into this!

theswordsmahin avatar Feb 13 '24 16:02 theswordsmahin

I think what we have is good enough for most people, so work on this would be a low priority for the team, but PRs are always welcome.

titusfortner avatar Feb 13 '24 17:02 titusfortner

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

github-actions[bot] avatar Feb 13 '24 17:02 github-actions[bot]

@theswordsmahin I download files from a browser using a remote driver. In my case I am running the Docker selenium grid. So they are downloaded to the relevant browser node. I then pull back the file to my local machine thats executing the code by using the Docker.DotNet NuGet package (I am using C#). If you are not using C# you can use the Docker Remote API directly or by whatever client for it exists for you language. Perhaps this could work for you and help you along?

MJB222398 avatar Feb 14 '24 11:02 MJB222398

@theswordsmahin Could you share the server log output? If it fails while writing JSON this might be an easy / small change.

I think most of the processing takes ~ zipped size * 2.33, but the JSON part should take ~ zipped size * 3.66 and could be reduced to ~ zipped size * 2.33 too.

joerg1985 avatar Feb 14 '24 16:02 joerg1985

@MJB222398 thanks for the suggestion, but my tests are running in an accompanying container, so I'm not sure I'd be able to do that, at least not quite as directly.

@joerg1985 The grid seems to handle this fine, the problem seems to be client side in my test when the RemoteWebdriver is reading the response from the downloadFile() call

       at org.openqa.selenium.remote.codec.w3c.W3CHttpResponseCodec.decode(W3CHttpResponseCodec.java:52)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:191)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:523)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:596)
        at org.openqa.selenium.remote.RemoteWebDriver.downloadFile(RemoteWebDriver.java:659)

Do you still think the server logs would be useful?

theswordsmahin avatar Feb 14 '24 17:02 theswordsmahin

@theswordsmahin looks like the ErrorFilter does catch it on the grid and send it to the client without logging.

You should get the remote stack when debugging into the response containing the exception.

joerg1985 avatar Feb 14 '24 18:02 joerg1985