docker-selenium icon indicating copy to clipboard operation
docker-selenium copied to clipboard

[šŸ› Bug]: java.lang.OutOfMemoryError

Open Doofus100500 opened this issue 11 months ago • 39 comments

What happened?

Getting oom in eventbus container image

Command used to start Selenium Grid with Docker (or Kubernetes)

helm

Relevant log output

{"class": "EventBusCommand","log-level": "INFO","log-message": "Started Selenium EventBus 4.26.0 (revision 69f9e5e): https:\u002f\u002f10.232.86.222:5557","log-name": "org.openqa.selenium.grid.commands.EventBusCommand","log-time-local": "2024-12-14T07:31:37.796Z","log-time-utc": "2024-12-14T07:31:37.796Z","method": "execute"}
Exception in thread "iothread-2" java.lang.OutOfMemoryError: Cannot reserve 8192 bytes of direct buffer memory (allocated: 501211210, limit: 501219328)
    at java.base/java.nio.Bits.reserveMemory(Bits.java:178)
    at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121)
    at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332)
    at zmq.io.coder.DecoderBase.<init>(DecoderBase.java:46)
    at zmq.io.coder.Decoder.<init>(Decoder.java:71)
    at zmq.io.coder.v2.V2Decoder.<init>(V2Decoder.java:18)
    at zmq.io.StreamEngine.handshake(StreamEngine.java:805)
    at zmq.io.StreamEngine.inEvent(StreamEngine.java:386)
    at zmq.io.IOObject.inEvent(IOObject.java:85)
    at zmq.poll.Poller.run(Poller.java:275)
    at java.base/java.lang.Thread.run(Thread.java:840)

Operating System

k8s

Docker Selenium version (image tag)

4.26.0-20241101

Selenium Grid chart version (chart version)

0.37.1

Doofus100500 avatar Dec 24 '24 13:12 Doofus100500

@Doofus100500, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

github-actions[bot] avatar Dec 24 '24 13:12 github-actions[bot]

It looks like the actual usage memory not reach the range of request and limit resources config. In the latest change, I add default SE_JAVA_OPTS for all component (in the server configmap, which is referred by all components) the -Xmx and -Xms for JVM selenium server. https://github.com/SeleniumHQ/docker-selenium/blob/2d80c8805d5141d3b382f32271d3bf032b0c1120/charts/selenium-grid/values.yaml#L366

Can you check it helps?

VietND96 avatar Dec 26 '24 07:12 VietND96

@joerg1985, do you have any comment on this?

VietND96 avatar Dec 26 '24 07:12 VietND96

-Xmx1024m -Xms256m

For all components, this is extremely low. In my opinion, it is necessary to make it possible to configure these parameters for each component individually. Under load, consumption increases significantly.

Doofus100500 avatar Dec 26 '24 09:12 Doofus100500

Via extraEnvironmentVariables in each component, I think you can override the global one

VietND96 avatar Dec 26 '24 09:12 VietND96

But this is not reflected in the chart for the eventBus and other distributed components

Doofus100500 avatar Dec 26 '24 09:12 Doofus100500

Oh really? Can you give example yaml values that you are settings?

VietND96 avatar Dec 26 '24 09:12 VietND96

For example, to address the issue with the event-bus mentioned in this issue, I added the following through k9s:

- name: SE_JAVA_OPTS  
  value: -Xmx2g

Doofus100500 avatar Dec 26 '24 09:12 Doofus100500

I just checked, in chart config, all distributed components are refer to this config for extra env vars components.extraEnvironmentVariables

VietND96 avatar Dec 26 '24 09:12 VietND96

That’s exactly what I’m saying. I want to set appropriate parameters for each component individually, rather than, for example, setting -Xmx16g for all of them.

Doofus100500 avatar Dec 26 '24 09:12 Doofus100500

Yes, I can understand the problem now, will add that config for each component, instead of common

VietND96 avatar Dec 26 '24 10:12 VietND96

Do you observe anything else that you think to fix in chart 0.38.3 also?

VietND96 avatar Dec 26 '24 10:12 VietND96

Unfortunately, I haven’t even looked into it yet. If I find anything, I’ll definitely come back in the future.

Doofus100500 avatar Dec 26 '24 10:12 Doofus100500

@VietND96 i had a short look at the code of EventBusCommand and when looking at this (without debugging) i would expect a leak in the /status call. It adds a listener, but never removes it. Will put this on my todo list.

joerg1985 avatar Dec 27 '24 22:12 joerg1985

The leaking listeners have been fixed in https://github.com/SeleniumHQ/selenium/commit/269a7f6c11955b542d15396cef56699f7f31b811 but i am not sure this is the root cause here, as there are only a few bytes leaked for each call to /status so the grid must be up for several days to see this.

joerg1985 avatar Dec 28 '24 12:12 joerg1985

Actually, in our case, we expect the grid (except for the pods with browsers) to always be operational. Could you please check for leaks and other components? image image image image

Doofus100500 avatar Dec 28 '24 13:12 Doofus100500

@Doofus100500 i think the best would be to create a heap histogram with jmap and share them here.

joerg1985 avatar Dec 28 '24 18:12 joerg1985

Unfortunately, I will only be able to take care of this after the 9th.

Doofus100500 avatar Dec 30 '24 18:12 Doofus100500

Via #2546, I added the way to get HeapDumpOnOutOfMemoryError, or get heap dump on demand when terminate/stop the container to directory /opt/selenium/logs. Need to use volume to mount that dir in container to persist the output files.

VietND96 avatar Jan 02 '25 07:01 VietND96

@Doofus100500 please wait for the next release before testing, this might be the fix for your issue: https://github.com/SeleniumHQ/selenium/pull/15011

joerg1985 avatar Jan 02 '25 20:01 joerg1985

Hi @VietND96 , have you considered using XX:MaxRAMPercentage and XX:MinRAMPercentage instead of Xmx and Xms? It seems like a good solution for general configuration in: https://github.com/SeleniumHQ/docker-selenium/blob/2d80c8805d5141d3b382f32271d3bf032b0c1120/charts/selenium-grid/values.yaml#L366

Doofus100500 avatar Jan 09 '25 13:01 Doofus100500

I’m just unsure what percentage to set for MaxRAMPercentage, could you help me with that?

Doofus100500 avatar Jan 13 '25 10:01 Doofus100500

Hi, this one I am also not sure, will try to understand and let you know if I am able to find something.

VietND96 avatar Jan 13 '25 10:01 VietND96

I tried to read something related https://stackoverflow.com/questions/75025893/is-jvm-heap-memory-option-xxmaxrampercentage-only-valid-for-dockerized-applic

When you run the application in a dedicated container, together with a known set of programs or no other programs at all, you most probably want to specify the maximum amount of memory in relation to the container’s memory, so when you want to change the available memory, you only have to reconfigure the container instead of needing to adapt all programs’ start configurations

With docker-selenium, each component (Hub/Router/Distributor/SessionQueue/SessionMap/EventBus) runs in a dedicated container with a single program, so let it utilize the maximum amount with --XX:MaxRAMPercentage=100 With component Node, besides the program, the browser also consumes memory, so let it utilize a half --XX:MaxRAMPercentage=50

VietND96 avatar Jan 18 '25 14:01 VietND96

@VietND96 the JVM should detect the container enviroment and adjust these values automatically, see https://bugs.openjdk.org/browse/JDK-8146115 for details.

joerg1985 avatar Jan 18 '25 15:01 joerg1985

@joerg1985, yes, but in a few graph screenshots above, OOM happened when actual memory consumed didn't reach the range between requests and limits allowed. What is your view?

VietND96 avatar Jan 20 '25 00:01 VietND96

There are multiple limits to the different areas of the heap. So setting MaxRAMPercentage might not help here. When setting it to 100% the heap takes all the memory, but what about the other memory areas? They also need some memory.

I don't think we need to fine tune the memory management, we need to find the root cause for the leak. But this might have been already fixed, so lets wait for @Doofus100500 feeback when using version 4.28.0

joerg1985 avatar Jan 23 '25 16:01 joerg1985

I’m currently experiencing issues with 4.28 and have opened an issue: https://github.com/SeleniumHQ/docker-selenium/issues/2655

Doofus100500 avatar Feb 20 '25 11:02 Doofus100500

Updated to 0.40.0(4.29.0-20250222)

Image

Image

Image

Image

Image

Doofus100500 avatar Feb 25 '25 13:02 Doofus100500

@joerg1985 Hi, here’s the heap histogram from the distributor, and I’m also attaching a screenshot from Grafana.

Image

heap_histogram.txt

Doofus100500 avatar Mar 11 '25 07:03 Doofus100500