SquadJS
SquadJS copied to clipboard
SquadRcon.onData Crashing Recently
Description of Issue
Crashed twice in the past two days on the following callstack; One immediately after a server crash, so I assumed it was related to a missing packet from the crash.
However, this morning we crashed shortly after hitting live during Fallujah Skimish v2
Errors or Screenshots of Issue
file:///opt/SquadJS/core/rcon.js:80 this.responseCallbackQueue.shift()( ^
TypeError: this.responseCallbackQueue.shift(...) is not a function at SquadRcon.onData (file:///opt/SquadJS/core/rcon.js:80:49) at Socket.emit (events.js:315:20) at addChunk (internal/streams/readable.js:309:12) at readableAddChunk (internal/streams/readable.js:284:9) at Socket.Readable.push (internal/streams/readable.js:223:10) at TCP.onStreamRead (internal/stream_base_commons.js:188:23)
Squad Information
- SquadJS Version: v3
- Squad Version: 2.9
If potentially relevant, please provide regarding the state of the Squad server at the time of error, e.g. the current layer.
System Information
- Operating System: Linux
- NodeJS Version: Node 14
- Yarn Version:
I'm having this same issue, however off of Windows. Did you ever solve this yourself?
Is this still an issue?
Is this still an issue?
Yes, though its rare.
Resolved on #258
I've root caused this, and the issue mentioned in
https://github.com/Team-Silver-Sphere/SquadJS/issues/249.
The essentials are this:
We are using the 'endpackets' to fire off the callbacks pending for each execute request,
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L80
These endpackets are generated here:
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L323
As described in:
https://developer.valvesoftware.com/wiki/Source_RCON_Protocol#Multiple-packet_Responses
The issue lies in part in here:
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L119
We expect to receive a series of two packets,
One length 10 with no extra data, one with extra data (0x0000 0001 0000 0000
).
We attempt to discard this empty data here: https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L135
However, getting to that check requires that we pass multiple conditions we can sometimes fail
We DO not receive packets from nodejs, we receive chunks of a stream. This at times means that the packets can span two or more onData events.
https://nodejs.org/api/net.html#class-netsocket
Given a sequence such as the following:
0a 00 00 00 02 00 00 00 00 00 00 00 00 00 0a 00 00 00 02 00 00 00 00 00 00 00 00 00 | 00 01
We will split out two "endpackets" and leave the junk data 00 01
in the buffer.
In fact, this can happen in several cases where we do not actually receive the rest of the 2nd packet in the current chunk we are processing.
In the above case, we split off packet one:
0a 00 00 00 02 00 00 00 00 00 00 00 00 00
0a 00 00 00 02 00 00 00 00 00 00 00 00 00 | 00 01
And fulfill its callback,
then
0a 00 00 00 02 00 00 00 00 00 00 00 00 00
| 00 01
And attempt to fulfill a second callback
Followed by prepending the left over data onto the next packet 00 01
This leads to the issue in https://github.com/Team-Silver-Sphere/SquadJS/issues/249, where we have a junk packet. And from this point on, all our packets are likely to be junk because we are now splitting the data incorrectly.
This was obvious when all of our junk packets had a 'size' of 256, or 00 01 00 00
As for the issue posted here,
If the callBackQueue size is 1, and we attempt to fulfill a second callback, we get the issues as raised here
TypeError: this.responseCallbackQueue.shift(...) is not a function.
We shift an empty array, get back undefined
(not a function), and cannot invoke it.
If the callback queue is is 2, we instead send an empty response to the waiting callback
"[SquadServer][1] Failed to update layer information. TypeError: Cannot read property '1' of null
at SquadRcon.getNextMap
From this point forward, again, we are now out of sync.
We will fail to match data on all incoming requests when we process the bodies back into results.
We can also deadlock if the Squad restarts while we have pending callbacks, as again, we do not clear these and are blindly assuming the incomingData
buffer and incomingResponse
buffers are correct.
I've checked multiple logs since finding a way to reproduce this issue and have never seen anything related to
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L285
As for the window for these errors, and why they are rare:
We need to fail the check here:
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L129
If we have say, 15, or 16 bytes, we will pass through to
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L149
Where we cut the packet, and leave the extra 1-2 bytes in the buffer.
We then fail the "while" check here:
https://github.com/Team-Silver-Sphere/SquadJS/blob/164bad3e4cc8e5d2097079bcb741778053b1b338/core/rcon.js#L119
I have a fix in progress for this issue, however the wider case is it still won't prevent all deadlocks, and may actually cause more when squad restarts while we have pending requests.
IE, we have sent some requests, squad crashes or gets restarted in the seconds before we have gotten back a full response.
Fixed