steve icon indicating copy to clipboard operation
steve copied to clipboard

memory leak in docker container with steve 3.4.3

Open zinsoldat opened this issue 4 years ago • 4 comments

Checklist

  • [X] I checked other issues already, but found no answer/solution
  • [X] I checked the documentation and wiki, but found no answer/solution
  • [ ] I am running the latest version and the issue still occurs
  • [ ] I am sure that this issue is about SteVe (and not about the charging station software or something unrelated to SteVe)

Specifications

SteVe Version : 3.4.3
Operating system  : ubuntu 18.04
JDK : ...
Database : MySQL 5.7

Expected Behavior

stable memory consumption

Actual Behavior

We are running steve in a docker container on a VM in Azure. It seems like there is a memory leak within steve. We connected three charging points and see an increasing memory consumption, with an OOM crash at the end. The memory consumption increases by roughly 300 MB/hour.

I can provide a 400 MB memory dump for further analysis.

Eclipse Memory Analyzer shows the following: One instance of "org.eclipse.jetty.websocket.server.WebSocketServerFactory" loaded by "jdk.internal.loader.ClassLoaders$AppClassLoader @ 0xec207540" occupies 223,865,928 (86.52%) bytes. The memory is accumulated in one instance of "java.lang.Object[]", loaded by "", which occupies 223,864,416 (86.52%) bytes.

steve_leak_1

The org.eclipse.jetty.websocket.server.WebSocketServerFactory has an instance of a java.util.concurrent.CopyOnWriteArrayList with 17686 ContainerLifeCycle Beans which contain WebSocketSession (as far as I checked).

steve_leak_2

Steps to Reproduce the Problem

  1. Start container with steve 3.4.3 version
  2. Connect multiple charging points
  3. Wait and see the memory usage increase

Additional context

zinsoldat avatar Feb 10 '21 07:02 zinsoldat

hey, thanks for the detailed report.

this is interesting, because steve does not control this part of the behaviour. org.eclipse.jetty.websocket.server.WebSocketServerFactory is part of the our server library (jetty) and steve does not directly interact with it.

each WebSocketSession represents an open websocket connection. do the charging points disconnect and reconnect a lot?

goekay avatar Feb 10 '21 12:02 goekay

Yes, in the logs there are a lot of entries for connection/connection closed.

[INFO ] 2021-02-10 13:22:10,673 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042960, sessionId=086fb6bc-49fb-8c12-639a-77cb4792899a] Connection is established
[INFO ] 2021-02-10 13:22:10,680 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042962, sessionId=22b61c3a-3821-600d-2a2e-6d170afa8095] Connection is established
[WARN ] 2021-02-10 13:22:12,668 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042960, sessionId=086fb6bc-49fb-8c12-639a-77cb4792899a] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[WARN ] 2021-02-10 13:22:12,678 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042962, sessionId=22b61c3a-3821-600d-2a2e-6d170afa8095] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[INFO ] 2021-02-10 13:22:13,187 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042962, sessionId=b41dc601-0c41-898c-9451-14a7008c63ab] Connection is established
[INFO ] 2021-02-10 13:22:13,225 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042960, sessionId=b11892a6-b01e-cc61-463d-4255744eb789] Connection is established
[WARN ] 2021-02-10 13:22:15,181 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042962, sessionId=b41dc601-0c41-898c-9451-14a7008c63ab] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[WARN ] 2021-02-10 13:22:15,221 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042960, sessionId=b11892a6-b01e-cc61-463d-4255744eb789] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[INFO ] 2021-02-10 13:22:15,633 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042962, sessionId=dce5f827-85c5-df22-2dbb-05891b404d63] Connection is established
[INFO ] 2021-02-10 13:22:15,682 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042960, sessionId=46181428-9a40-6727-a14c-fcb567edf675] Connection is established
[WARN ] 2021-02-10 13:22:17,627 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042962, sessionId=dce5f827-85c5-df22-2dbb-05891b404d63] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[WARN ] 2021-02-10 13:22:17,680 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042960, sessionId=46181428-9a40-6727-a14c-fcb567edf675] Connection is closed, status: CloseStatus[code=1006, reason=Disconnected]
[INFO ] 2021-02-10 13:22:19,466 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-23) - [chargeBoxId=DEV1042962, sessionId=14a4887f-1a22-c187-c60e-51c51d876f04] Connection is established
[INFO ] 2021-02-10 13:22:19,669 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger (qtp1527152775-14) - [chargeBoxId=DEV1042960, sessionId=86737e38-058e-0340-3601-9b62254ea71b] Connection is established

Need to check with the charging point. But even if that's an issue, it shouldn't lead to a memory leak I would say.

zinsoldat avatar Feb 10 '21 13:02 zinsoldat

how frequent are the GC runs? maybe the objects are dead and dereferenced, but GC is so infrequent that it cannot catch up to garbage being produced.

goekay avatar Feb 10 '21 13:02 goekay

another data point after some googling: this same issue was reported and discussed around 5 years ago [1] [2]. it was fixed with an update though. maybe a regression was introduced. i will take a closer look when i have time.

[1] https://www.eclipse.org/lists/jetty-users/msg06649.html [2] https://stackoverflow.com/questions/35401928

goekay avatar Feb 10 '21 17:02 goekay

hey @zinsoldat is this still a valid issue in the most recent version?

there was a big discovery and fix in this version. it might the solution to the problem you are describing.

  • Use the default values for input/output buffers (https://github.com/steve-community/steve/issues/846) This is big: There has been ongoing complaints about the memory behaviour, and more specifically OOM. @jpires did a great job uncovering and fixing it in https://github.com/steve-community/steve/pull/1058

anyways, i am closing it with the aforementioned anticipation. feel free to open it if otherwise.

goekay avatar Feb 18 '24 11:02 goekay