steve icon indicating copy to clipboard operation
steve copied to clipboard

Exceptions and websocket closures on StopTransaction

Open minhtrietdiep opened this issue 2 years ago • 2 comments

Checklist

  • [X] I checked other issues already, but found no answer/solution
  • [X] I checked the documentation and wiki, but found no answer/solution
  • [X] I am running the latest version and the issue still occurs
  • [X] I am sure that this issue is about SteVe (and not about the charging station software or something unrelated to SteVe)

Specifications

SteVe Version     : 3.5.0
Operating system  : ...
JDK               : ...
Database          : MariaDB

Expected Behavior

SteVe responds to StopTransaction normally.

Actual Behavior

SteVe takes some time after receiving a StopTransaction, then proceeds to get a database access error :

org.jooq.exception.DataAccessException: SQL [insert into `stevedb`.`transaction_stop` (`transaction_pk`, `event_timestamp`, `event_actor`, `stop_timestamp`, `stop_value`, `stop_reason`) values (?, ?, ?, ?, ?, ?)]; Cannot add or update a child row: a foreign key constraint fails (`stevedb`.`transaction_stop`, CONSTRAINT `FK_transaction_stop_transaction_pk` FOREIGN KEY (`transaction_pk`) REFERENCES `transaction_start` (`transaction_pk`) ON DELETE CASCADE ON UPDATE NO AC)

Then, somehow seems to accept the tag, sends the message, but doesn't manage to do that and the websocket connection closes.

Steps to Reproduce the Problem

N/A, hard to reproduce from a clean slate.

Additional context

12 chargers had their configuration settings changed, and were set to do a soft reset to apply these changes. Some vehicles were in a charging session. The StopTransaction messages seem to be received by SteVe, but nothing happens. The implementation of the charge points causes this StopTransaction to be re-sent after an apparent no-acknowledge.

This just keeps repeating until SteVe crashes and restarts, with the log being filled with exceptions such as these:

[ERROR] 2023-05-09 17:00:07,600 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger - [chargeBoxId=CHARGER-XXX, sessionId=12f2762b-b39e-5988-7457-1ec64a5d85c0] Transport error
org.eclipse.jetty.io.EofException: null
	at org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:116) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:275) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:254) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:386) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.FrameFlusher.process(FrameFlusher.java:330) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.enqueueFrame(WebSocketConnection.java:640) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession$OutgoingAdaptor.sendFrame(WebSocketCoreSession.java:724) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.ExtensionStack.sendFrame(ExtensionStack.java:258) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession$Flusher.forwardFrame(WebSocketCoreSession.java:815) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.FragmentingFlusher.onFrame(FragmentingFlusher.java:50) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.TransformingFlusher$Flusher.process(TransformingFlusher.java:163) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:232) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:214) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.TransformingFlusher.sendFrame(TransformingFlusher.java:77) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.sendFrame(WebSocketCoreSession.java:526) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.common.JettyWebSocketRemoteEndpoint.sendPong(JettyWebSocketRemoteEndpoint.java:188) ~[websocket-jetty-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onPingFrame(JettyWebSocketFrameHandler.java:373) ~[websocket-jetty-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.common.JettyWebSocketFrameHandler.onFrame(JettyWebSocketFrameHandler.java:234) ~[websocket-jetty-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession$IncomingAdaptor.lambda$onFrame$1(WebSocketCoreSession.java:671) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1463) ~[jetty-server-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1482) ~[jetty-server-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.server.internal.AbstractHandshaker$1.handle(AbstractHandshaker.java:212) ~[websocket-core-server-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession$IncomingAdaptor.onFrame(WebSocketCoreSession.java:671) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.ExtensionStack.onFrame(ExtensionStack.java:120) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketCoreSession.onFrame(WebSocketCoreSession.java:481) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onFrame(WebSocketConnection.java:262) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.fillAndParse(WebSocketConnection.java:455) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onFillable(WebSocketConnection.java:340) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038) ~[jetty-util-10.0.11.jar:10.0.11]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.io.IOException: Broken pipe
	at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:?]
	at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:?]
	at sun.nio.ch.IOUtil.write(IOUtil.java:182) ~[?:?]
	at sun.nio.ch.IOUtil.write(IOUtil.java:130) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:493) ~[?:?]
	at java.nio.channels.SocketChannel.write(SocketChannel.java:507) ~[?:?]
	at org.eclipse.jetty.io.SocketChannelEndPoint.flush(SocketChannelEndPoint.java:110) ~[jetty-io-10.0.11.jar:10.0.11]
	... 41 more
[WARN ] 2023-05-09 17:00:07,600 de.rwth.idsg.steve.ocpp.ws.WebSocketLogger - [chargeBoxId=CHARGER-XXX, sessionId=12f2762b-b39e-5988-7457-1ec64a5d85c0] Connection is closed, status: CloseStatus[code=1006, reason=null]
[WARN ] 2023-05-09 17:00:07,601 org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy - Task run failed
java.lang.IllegalStateException: already released RetainableByteBuffer@e2e9a75{DirectByteBuffer@75e1cfee[p=0,l=0,c=8388608,r=0]={<<<>>>\x89\x90\x11\x1a\xAb\x9eocp...\x00\x00\x00\x00\x00\x00\x00},r=0}
	at org.eclipse.jetty.io.RetainableByteBuffer.lambda$release$2(RetainableByteBuffer.java:109) ~[jetty-io-10.0.11.jar:10.0.11]
	at java.util.concurrent.atomic.AtomicInteger.updateAndGet(AtomicInteger.java:279) ~[?:?]
	at org.eclipse.jetty.io.RetainableByteBuffer.release(RetainableByteBuffer.java:106) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.releaseNetworkBuffer(WebSocketConnection.java:330) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.fillAndParse(WebSocketConnection.java:504) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.websocket.core.internal.WebSocketConnection.onFillable(WebSocketConnection.java:340) ~[websocket-core-common-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[jetty-io-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894) ~[jetty-util-10.0.11.jar:10.0.11]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038) ~[jetty-util-10.0.11.jar:10.0.11]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]

The 12 offending chargers have been removed in SteVe, but the issue still occurs for other chargers that send a StopTransaction, resulting in the same behavior of SteVe getting sluggish and finally seemingly crashing.

Even tasks like GetConfiguration will not successfully execute, if one gets through to their respective pages.

From our IT department, SteVe has already been assigned more CPU and RAM, and more connections to MariaDB, but it wasn't hitting those limits in the first place, and shouldn't be access-limited by the DB software. They also do not know what could be the issue. SteVe of course has been restarted several times.

What could this have been caused by and how do we recover from this, without nuking the database?

minhtrietdiep avatar May 09 '23 15:05 minhtrietdiep

Additional information:

So we've figured out more: SteVe was slow with Authorization/StartTransaction accepts (in the order of 10 seconds). The moment our multiple chargers come back online, SteVe seems to get overwhelmed, and isn't able to acknowledge the StartTransaction - which causes the chargers to keep retrying.

It does appear SteVe can still send some messages (to get diagnostics), it just can't respond in time to messages from the chargers...

minhtrietdiep avatar May 11 '23 11:05 minhtrietdiep

Hi @minhtrietdiep, would you please ask your IT department if there is a slow sql-query time correlated to StartTransaction? thx

fnkbsi avatar Jun 21 '23 16:06 fnkbsi