tftpy
tftpy copied to clipboard
Dallying after tftpy sends last ACK
(Thanks to tftpy.)
I'm working on a problem where a tftpy has successfully downloaded a file, but the other end seems to think that the connection is still open. I don't know that the following is the cause, but I have noticed an oddity that might be relevant...
RFC 1350 §6 states: "The end of a transfer is marked by a DATA packet that contains between 0 and 511 bytes of data (i.e., Datagram length < 516). This packet is acknowledged by an ACK packet like all other DATA packets. The host acknowledging the final DATA packet may terminate its side of the connection on sending the final ACK. On the other hand, dallying is encouraged. This means that the host sending the final ACK will wait for a while before terminating in order to retransmit the final ACK if it has been lost. The acknowledger will know that the ACK has been lost if it receives the final DATA packet again. The host sending the last DATA must retransmit it until the packet is acknowledged or the sending host times out."
If I've read the TftpState.handleDat function correctly, then after sending the ACK, it immediately terminates the tftpy connection by returning a None state. Perhaps there could be one more TftpyState that "Dallys" for a while, in case the sender missed the ACK? (Again, I could have missed something, but it looks like the tftpy upload process would hang on an endless resendLast if it did not receive the last ACK.)
Not sure. Maybe we could add a unit test to simulate that and see if it happens?
Thanks for the quick response. I've been trying some "quick and dirty" testing, but haven't yet demonstrated the fault.
- I've cloned the master from
https://github.com/msoulier/tftpy
; - I've created a python virtual environment and done a
pip install -e
on the cloned repo; - I've tweaked
TftpStates.handleDat
to add a guard to theself.sendACK()
, specificallyif not getattr(self.context, "drop_last_ack", False) or len(pkt.data) < self.context.getBlocksize(): self.sendACK()
(to simulate loss of the last ACK packet.) - I've tweaked
TestTftpyState.testServerDownloadWithDynamicPort
to addclient.drop_last_ack = True
immediately beforetime.sleep(1)
to insert the fault in this test case;
I've tried running the tests (python t/test.py TestTftpyState.testServerDownloadWithDynamicPort
) with and without the client.drop_last_ack = True
and they all run to completion, but the test itself don't include any asserts and I'm not sure what I should be looking for in the output, or testing for in the test case for whether the server session closed normally.
I'll come back to it tomorrow.
OK, I think I'm making a bit more progress on this ... I've forked the repository (https://github.com/steverpalmer/tftpy.git) to be better able to share what I am doing. I've created a TftpPacketHook branch in which I've extended the idea of the packethook to something that can report on all received and sent packets, as well as being able to filter out packets from processing/sending. I've copied the testServerDownloadWithDynamicPort to a new testServerDownloadWithPacketTrace which allows me to create a log of packets to/from the Server/Downloaded The new test also filters out the last ACK packet from the downloader. The packet trace is as follows:
2022-11-27 13:27:26,516 D -> RRQ packet: filename = 640KBFILE mode = octet to ('localhost', 42083) 2022-11-27 13:27:26,517 S <- RRQ packet: filename = 640KBFILE mode = octet from ('127.0.0.1', 53507) 2022-11-27 13:27:26,517 S -> DAT packet: block 1, data: 512 bytes to ('127.0.0.1', 53507) 2022-11-27 13:27:26,518 D <- DAT packet: block 1, data: 512 bytes from ('127.0.0.1', 42032) 2022-11-27 13:27:26,518 D -> ACK packet: block 1 to ('localhost', 42032) 2022-11-27 13:27:26,518 S <- ACK packet: block 1 from ('127.0.0.1', 53507) 2022-11-27 13:27:26,518 S -> DAT packet: block 2, data: 512 bytes to ('127.0.0.1', 53507) 2022-11-27 13:27:26,519 D <- DAT packet: block 2, data: 512 bytes from ('127.0.0.1', 42032) 2022-11-27 13:27:26,519 D -> ACK packet: block 2 to ('localhost', 42032) 2022-11-27 13:27:26,519 S <- ACK packet: block 2 from ('127.0.0.1', 53507) 2022-11-27 13:27:26,519 S -> DAT packet: block 3, data: 512 bytes to ('127.0.0.1', 53507) 2022-11-27 13:27:26,520 D <- DAT packet: block 3, data: 512 bytes from ('127.0.0.1', 42032) 2022-11-27 13:27:26,520 D -> ACK packet: block 3 to ('localhost', 42032) 2022-11-27 13:27:26,520 S <- ACK packet: block 3 from ('127.0.0.1', 53507) [snip] 2022-11-27 13:27:27,552 S -> DAT packet: block 1280, data: 512 bytes to ('127.0.0.1', 53507) 2022-11-27 13:27:27,553 D <- DAT packet: block 1280, data: 512 bytes from ('127.0.0.1', 42032) 2022-11-27 13:27:27,553 D -> ACK packet: block 1280 to ('localhost', 42032) 2022-11-27 13:27:27,553 S <- ACK packet: block 1280 from ('127.0.0.1', 53507) 2022-11-27 13:27:27,553 S -> DAT packet: block 1281 to ('127.0.0.1', 53507) 2022-11-27 13:27:27,553 D <- DAT packet: block 1281 from ('127.0.0.1', 42032) 2022-11-27 13:27:27,554 D -> ACK packet: block 1281 to ('localhost', 42032) 2022-11-27 13:27:27,554 D dropping ACK #1281 2022-11-27 13:27:32,560 S -> DAT packet: block 1281 to ('127.0.0.1', 53507) 2022-11-27 13:27:37,580 S -> DAT packet: block 1281 to ('127.0.0.1', 53507)
The test (without the dropped ACK) previous ran on my machine in about 2 to 2.6 seconds, whereas with the dropped ACK the test took 17 to 17.6 seconds.
The corresponding section of the debug log output is:
INFO [tftpy.TftpStates:187] Handling DAT packet - block 1281 DEBUG [tftpy.TftpStates:188] Expecting block 1281 DEBUG [tftpy.TftpStates:190] Good, received block 1281 in sequence DEBUG [tftpy.TftpStates:136] In sendACK, passed blocknumber is None INFO [tftpy.TftpStates:139] Sending ack to block 1281 DEBUG [tftpy.TftpStates:195] Writing 0 bytes to output file INFO [tftpy.TftpStates:200] End of file detected DEBUG [tftpy.TftpContext:147] in TftpContext.end - closing socket DEBUG [tftpy.TftpContext:150] self.fileobj is open - closing DEBUG [tftpy.TftpContext:468] Set metrics.end_time to 1669555647.554253 DEBUG [tftpy.TftpContext:58] TftpMetrics.compute: duration is 1.0377109050750732 DEBUG [tftpy.TftpContext:61] TftpMetrics.compute: kbps is 4933.936778499589 INFO [tftpy.TftpClient:81] INFO [tftpy.TftpClient:82] Download complete. INFO [tftpy.TftpClient:86] Downloaded 655360.00 bytes in 1.04 seconds INFO [tftpy.TftpClient:90] Average rate: 4933.94 kbps INFO [tftpy.TftpClient:91] 0.00 bytes in resent data INFO [tftpy.TftpClient:92] Received 0 duplicate packets DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpContext:133] checking for timeout on session 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> ERROR [tftpy.TftpServer:232] Timeout waiting for traffic DEBUG [tftpy.TftpServer:240] resending on session 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> WARNING [tftpy.TftpStates:173] Resending packet DAT packet: block 1281 on sessions <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> DEBUG [tftpy.TftpContext:67] Recording a dup of DAT packet: block 1281 DEBUG [tftpy.TftpPacketTypes:307] Encoding an empty DAT packet DEBUG [tftpy.TftpServer:243] Iterating deletion list. DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is True DEBUG [tftpy.TftpServer:135] Performing select on this inputlist: [<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 42083)>, <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('0.0.0.0', 42032)>] DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpContext:133] checking for timeout on session 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> ERROR [tftpy.TftpServer:232] Timeout waiting for traffic DEBUG [tftpy.TftpServer:240] resending on session 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> WARNING [tftpy.TftpStates:173] Resending packet DAT packet: block 1281 on sessions <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> DEBUG [tftpy.TftpContext:67] Recording a dup of DAT packet: block 1281 DEBUG [tftpy.TftpPacketTypes:307] Encoding an empty DAT packet DEBUG [tftpy.TftpServer:243] Iterating deletion list. DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is True DEBUG [tftpy.TftpServer:135] Performing select on this inputlist: [<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 42083)>, <socket.socket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('0.0.0.0', 42032)>] DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpContext:133] checking for timeout on session 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80> ERROR [tftpy.TftpServer:232] Timeout waiting for traffic DEBUG [tftpy.TftpServer:235] hit max retries on 127.0.0.1:53507 <tftpy.TftpStates.TftpStateExpectACK object at 0x7f58c5229e80>, giving up DEBUG [tftpy.TftpServer:243] Iterating deletion list. INFO [tftpy.TftpServer:245] INFO [tftpy.TftpServer:246] Session 127.0.0.1:53507 complete DEBUG [tftpy.TftpServer:248] Gathering up metrics from session before deleting DEBUG [tftpy.TftpContext:147] in TftpContext.end - closing socket DEBUG [tftpy.TftpContext:150] self.fileobj is open - closing DEBUG [tftpy.TftpContext:286] Set metrics.end_time to 1669555662.6012936 DEBUG [tftpy.TftpContext:287] Detected dups in transfer: 0 DEBUG [tftpy.TftpContext:58] TftpMetrics.compute: duration is 16.084179878234863 DEBUG [tftpy.TftpContext:61] TftpMetrics.compute: kbps is 318.3252138909732 INFO [tftpy.TftpServer:254] Transferred 655360 bytes in 16.08 seconds INFO [tftpy.TftpServer:258] Average rate: 318.33 kbps INFO [tftpy.TftpServer:259] 8.00 bytes in resent data INFO [tftpy.TftpServer:260] 2 duplicate packets DEBUG [tftpy.TftpServer:261] Deleting session 127.0.0.1:53507
So the Download does "drop" the connection immediately after it sent the last ACK, and the Server does attempt to resend the DAT, but eventually reaches a retry limit and then drops the connection.
If I drop any other than the last ACK, then the packet trace shows a rapid recovery...
2022-11-27 14:18:15,437 D -> RRQ packet: filename = 640KBFILE mode = octet to ('localhost', 46793) 2022-11-27 14:18:15,437 S <- RRQ packet: filename = 640KBFILE mode = octet from ('127.0.0.1', 39942) 2022-11-27 14:18:15,438 S -> DAT packet: block 1, data: 512 bytes to ('127.0.0.1', 39942) 2022-11-27 14:18:15,438 D <- DAT packet: block 1, data: 512 bytes from ('127.0.0.1', 54696) 2022-11-27 14:18:15,438 D -> ACK packet: block 1 to ('localhost', 54696) 2022-11-27 14:18:15,438 D dropping ACK #1 2022-11-27 14:18:20,440 S -> DAT packet: block 1, data: 512 bytes to ('127.0.0.1', 39942) 2022-11-27 14:18:20,442 D <- DAT packet: block 1, data: 512 bytes from ('127.0.0.1', 54696) 2022-11-27 14:18:20,443 D -> ACK packet: block 1 to ('localhost', 54696) 2022-11-27 14:18:20,445 S <- ACK packet: block 1 from ('127.0.0.1', 39942) 2022-11-27 14:18:20,445 S -> DAT packet: block 2, data: 512 bytes to ('127.0.0.1', 39942) 2022-11-27 14:18:20,447 D <- DAT packet: block 2, data: 512 bytes from ('127.0.0.1', 54696) 2022-11-27 14:18:20,448 D -> ACK packet: block 2 to ('localhost', 54696) 2022-11-27 14:18:20,449 S <- ACK packet: block 2 from ('127.0.0.1', 39942) This is because the Downloader is still active and resends the missed ACK as soon as it sees the repeated DAT packet. The test completes in 7.2 seconds on my machine.
I've repeated the test, but changed it to an upload. In this case, I drop the last ACK from the Server after the Uploader has sent an empty DAT packet. The Packet log is as expected:
2022-11-28 09:48:23,284 U -> WRQ packet: filename = 640KBFILE mode = octet to ('localhost', 50986) 2022-11-28 09:48:23,285 S <- WRQ packet: filename = 640KBFILE mode = octet from ('127.0.0.1', 43348) 2022-11-28 09:48:23,286 S -> ACK packet: block 0 to ('127.0.0.1', 43348) 2022-11-28 09:48:23,286 U <- ACK packet: block 0 from ('127.0.0.1', 43105) 2022-11-28 09:48:23,286 U -> DAT packet: block 1, data: 512 bytes to ('localhost', 43105) 2022-11-28 09:48:23,287 S <- DAT packet: block 1, data: 512 bytes from ('127.0.0.1', 43348) 2022-11-28 09:48:23,287 S -> ACK packet: block 1 to ('127.0.0.1', 43348) 2022-11-28 09:48:23,287 U <- ACK packet: block 1 from ('127.0.0.1', 43105) 2022-11-28 09:48:23,287 U -> DAT packet: block 2, data: 512 bytes to ('localhost', 43105) 2022-11-28 09:48:23,288 S <- DAT packet: block 2, data: 512 bytes from ('127.0.0.1', 43348) 2022-11-28 09:48:23,288 S -> ACK packet: block 2 to ('127.0.0.1', 43348) 2022-11-28 09:48:23,288 U <- ACK packet: block 2 from ('127.0.0.1', 43105) [snip] 2022-11-28 09:48:24,311 U -> DAT packet: block 1280, data: 512 bytes to ('localhost', 43105) 2022-11-28 09:48:24,311 S <- DAT packet: block 1280, data: 512 bytes from ('127.0.0.1', 43348) 2022-11-28 09:48:24,311 S -> ACK packet: block 1280 to ('127.0.0.1', 43348) 2022-11-28 09:48:24,311 U <- ACK packet: block 1280 from ('127.0.0.1', 43105) 2022-11-28 09:48:24,312 U -> DAT packet: block 1281 to ('localhost', 43105) 2022-11-28 09:48:24,312 S <- DAT packet: block 1281 from ('127.0.0.1', 43348) 2022-11-28 09:48:24,312 S -> ACK packet: block 1281 to ('127.0.0.1', 43348) 2022-11-28 09:48:24,312 S dropping ACK #1281 2022-11-28 09:48:29,318 U -> DAT packet: block 1281 to ('localhost', 43105) 2022-11-28 09:48:34,322 U -> DAT packet: block 1281 to ('localhost', 43105)
What I hadn't expected was that the Upload failed raising an TftpTimeout exception.
The extract from the debug log follows:
INFO [tftpy.TftpStates:116] Reached EOF on file 640KBFILE DEBUG [tftpy.TftpStates:126] Sending DAT packet 1281 DEBUG [tftpy.TftpPacketTypes:307] Encoding an empty DAT packet DEBUG [tftpy.TftpContext:354] State is <tftpy.TftpStates.TftpStateExpectACK object at 0x7fabc2e5b490> DEBUG [tftpy.TftpServer:205] Matched input to session key 127.0.0.1:43348 DEBUG [tftpy.TftpContext:192] Received 4 bytes from 127.0.0.1:43348 DEBUG [tftpy.TftpPacketFactory:34] parsing a 4 byte packet DEBUG [tftpy.TftpPacketFactory:36] opcode is 3 DEBUG [tftpy.TftpPacketTypes:321] decoding DAT packet, block number 1281 DEBUG [tftpy.TftpPacketTypes:322] should be 4 bytes in the packet total DEBUG [tftpy.TftpPacketTypes:325] found 0 bytes of data INFO [tftpy.TftpStates:187] Handling DAT packet - block 1281 DEBUG [tftpy.TftpStates:188] Expecting block 1281 DEBUG [tftpy.TftpStates:190] Good, received block 1281 in sequence DEBUG [tftpy.TftpStates:136] In sendACK, passed blocknumber is None INFO [tftpy.TftpStates:139] Sending ack to block 1281 DEBUG [tftpy.TftpStates:195] Writing 0 bytes to output file INFO [tftpy.TftpStates:200] End of file detected INFO [tftpy.TftpServer:210] Successful transfer. DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpContext:133] checking for timeout on session 127.0.0.1:43348 None DEBUG [tftpy.TftpServer:243] Iterating deletion list. INFO [tftpy.TftpServer:245] INFO [tftpy.TftpServer:246] Session 127.0.0.1:43348 complete DEBUG [tftpy.TftpServer:248] Gathering up metrics from session before deleting DEBUG [tftpy.TftpContext:147] in TftpContext.end - closing socket DEBUG [tftpy.TftpContext:150] self.fileobj is open - closing DEBUG [tftpy.TftpContext:286] Set metrics.end_time to 1669628904.3128195 DEBUG [tftpy.TftpContext:287] Detected dups in transfer: 0 DEBUG [tftpy.TftpContext:58] TftpMetrics.compute: duration is 1.027372121810913 DEBUG [tftpy.TftpContext:61] TftpMetrics.compute: kbps is 4983.588605631185 INFO [tftpy.TftpServer:254] Transferred 655360 bytes in 1.03 seconds INFO [tftpy.TftpServer:258] Average rate: 4983.59 kbps INFO [tftpy.TftpServer:259] 0.00 bytes in resent data INFO [tftpy.TftpServer:260] 0 duplicate packets DEBUG [tftpy.TftpServer:261] Deleting session 127.0.0.1:43348 DEBUG [tftpy.TftpServer:263] Session list is now {} DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is False DEBUG [tftpy.TftpServer:135] Performing select on this inputlist: [<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 50986)>] WARNING [tftpy.TftpContext:188] Timeout waiting for traffic, retrying... ERROR [tftpy.TftpContext:357] Timed-out waiting for traffic WARNING [tftpy.TftpContext:363] resending last packet WARNING [tftpy.TftpStates:173] Resending packet DAT packet: block 1281 on sessions <tftpy.TftpStates.TftpStateExpectACK object at 0x7fabc2e5b490> DEBUG [tftpy.TftpContext:67] Recording a dup of DAT packet: block 1281 DEBUG [tftpy.TftpPacketTypes:307] Encoding an empty DAT packet DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpServer:243] Iterating deletion list. DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is False DEBUG [tftpy.TftpServer:135] Performing select on this inputlist: [<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 50986)>] DEBUG [tftpy.TftpContext:354] State is <tftpy.TftpStates.TftpStateExpectACK object at 0x7fabc2e5b490> WARNING [tftpy.TftpContext:188] Timeout waiting for traffic, retrying... ERROR [tftpy.TftpContext:357] Timed-out waiting for traffic WARNING [tftpy.TftpContext:363] resending last packet WARNING [tftpy.TftpStates:173] Resending packet DAT packet: block 1281 on sessions <tftpy.TftpStates.TftpStateExpectACK object at 0x7fabc2e5b490> DEBUG [tftpy.TftpContext:67] Recording a dup of DAT packet: block 1281 DEBUG [tftpy.TftpPacketTypes:307] Encoding an empty DAT packet DEBUG [tftpy.TftpContext:354] State is <tftpy.TftpStates.TftpStateExpectACK object at 0x7fabc2e5b490> DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpServer:243] Iterating deletion list. DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is False DEBUG [tftpy.TftpServer:135] Performing select on this inputlist: [<socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_DGRAM, proto=0, laddr=('127.0.0.1', 50986)>] WARNING [tftpy.TftpContext:188] Timeout waiting for traffic, retrying... ERROR [tftpy.TftpContext:357] Timed-out waiting for traffic DEBUG [tftpy.TftpContext:360] hit max retries, giving up DEBUG [tftpy.TftpServer:226] Looping on all sessions to check for timeouts DEBUG [tftpy.TftpServer:243] Iterating deletion list. DEBUG [tftpy.TftpServer:110] shutdown_immediately is False DEBUG [tftpy.TftpServer:111] shutdown_gracefully is True INFO [tftpy.TftpServer:124] In graceful shutdown mode and all sessions complete. DEBUG [tftpy.TftpServer:269] server returning from while loop DEBUG [tftpy.TftpContext:147] in TftpContext.end - closing socket DEBUG [tftpy.TftpContext:286] Set metrics.end_time to 1669628919.330073 DEBUG [tftpy.TftpContext:287] Detected dups in transfer: 0 DEBUG [tftpy.TftpContext:58] TftpMetrics.compute: duration is 16.044625759124756 DEBUG [tftpy.TftpContext:61] TftpMetrics.compute: kbps is 319.10996721679214 DEBUG [tftpy.TftpContext:147] in TftpContext.end - closing socket DEBUG [tftpy.TftpContext:370] Set metrics.end_time to 1669628919.3379288 DEBUG [tftpy.TftpContext:58] TftpMetrics.compute: duration is 16.053446531295776 DEBUG [tftpy.TftpContext:61] TftpMetrics.compute: kbps is 318.9346281509515
I think that the server, having sent the last ACK is dropping the connection, then ignores the Client resends until the client gives up and throws an exception. Of course, it is always possible that the connection between the Client and Server is lost at the end of communication, so the Client must be ready to handle a lost connection.
(I hope you don't see the above as "spam" comments, but only as an honest attempt to understand how tftpy works.)
A "dally" state after the last ACK packet can probably only be terminated by a timeout since the other-end of the connection is not expecting to send anything more after receiving the last ACK packet.
In the case of a Client Download from a Server, dallying will cause a delay to the client that will effect every download for the sake of some wasted resends on the Server in an error corner-case. I tend to assume that the Server is likely to be around for as long as it is needed, so a few wasted resends do not seem to be a problem, and the Server will clean-up by simply dropping the connection. Therefore, in this case a dally would not be justified.
However, in the Upload case, there may be an argument for the Server dallying after sending its last ACK packet. This could remove superfluous resends for the Upload client (and maybe even a complete upload reattempt or passing up a false failure to the Uploads own client), while causing only a minor delay in closing a connection on the Server. The tftpy Server can accept many connections, so such a delay will not cause a delay to any other transfer requests.