tftpy icon indicating copy to clipboard operation
tftpy copied to clipboard

Lost Packet handling

Open tobox opened this issue 11 years ago • 21 comments

When tftpy runs as a server and a data packet is lost, all further requests for the lost packet are just logged as duplicate ACKs by tftpy. If I add

          self.sendDAT()

right after

        elif pkt.blocknumber < self.context.next_block:
            log.debug("Received duplicate ACK for block %d"
                % pkt.blocknumber)
            self.context.metrics.add_dup(pkt)

the transfer continues properly after packet loss. I don't know if it is an error of tftpy or of my tftp client (u-boot bootloader), but the problem does not occur with other tftp servers.

Regards, Thomas

tobox avatar Jan 30 '14 14:01 tobox

Hmm. So we received the ACK already and sent the DAT, but perhaps the ACK from the client was lost.

The reason we don't send a DAT in response to a duplicate ACK is the sorceror's apprentice problem. So how can I safely send a DAT there without introducing that problem?

msoulier avatar Jun 07 '15 12:06 msoulier

I've proposed a pull request for this. "Implement retransmission of packet on peer request". https://github.com/msoulier/tftpy/pull/58

The reported issue is actually due to an incomplete implementation of the protocol as per RFC 1350. In summary, duplicate recieve of the previous peer packet is a request for resend of the own previous packet as per the protocol.

The pull request handle this.

I've tested it on other clients such as u-boot tftp client and it fixes the issue.

guillon avatar Jun 18 '15 14:06 guillon

FWIW, I believe I have seen this same problem when doing U-Boot tftp transfers. Have not tested the PR but could probably give it a whirl to see if it increases robustness when on a congested network.

posborne avatar Jun 19 '15 00:06 posborne

Indeed, u-boot tftp client, implements a timeout policy waiting for peer packets, and when a timeout occurs, ask for the expected packet to be resent as described in the protocol.

This unimplemented tftpy_server behavior (packet retransmissoin) will not occur when testing the tftpy_server with tftpy own client (tftpy_client) as tftpy_client is simpler and just waits peer packet forever.

More precisely, as soon as the client timeout is lower then the tftpy_server timeout, the client will keep sending requests for resending the previous packet. As the server sees traffic on the socket (for packet it wrongly drops), then it itself never gets into a timeout and never resend by itself the packet. At the end the client never receives any more packet from the server and the transmission is completly stucked.

guillon avatar Jun 23 '15 20:06 guillon

See: #67 commit: Improve lost DAT handling

This fixes the problem without re-introducing the sorceror's apprentice problem. The solution is not to update the timeout counter when a duplicate ACK packet is received, so the server timeout waits for "progress" not "any traffic".

navlrac avatar Apr 12 '16 04:04 navlrac

Hi @msoulier, is there any chance to get the fix by @navlrac merged?

Thanks!

PiQuer avatar Jun 15 '21 10:06 PiQuer

Hi,

It seems to be that I have the U-Boot issue describe by @guillon is tftpy 0.8.1 fix this problem?

BuhtigithuB avatar Jun 18 '21 14:06 BuhtigithuB

Hi,

Here attempt to resolve PR #67 merge conflicts : PR #121

@msoulier and @navlrac assess I didn't break anythings and if I didn't please merge ASAP.

Thanks

BuhtigithuB avatar Jun 26 '21 17:06 BuhtigithuB

@msoulier please help us getting this issue fixed, it is causing quite a lot of trouble in our CI pipeline. If the fix by @BuhtigithuB is fine, please merge it.

PiQuer avatar Aug 03 '21 13:08 PiQuer

@PiQuer, my attempt to resolve the conflict wasn't successful and I didn't had time to retry. If you have some time you can try it strating from @navlrac #67 PR

BuhtigithuB avatar Aug 20 '21 18:08 BuhtigithuB

With this cherry-pick : https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-rfc-7440-support-only

Tests pass... So cherry pick succeeded... Although when I add the navlrac tests for it rfc 7440 feature there is 3 tests that start to fail which are not even navlrac's tests... I am a bit puzzled...

I think I didn't fetch upstream before attempt the cherry pick :( so I am not up to date... Will try to update my repo and rebase...

BuhtigithuB avatar Oct 15 '21 23:10 BuhtigithuB

Fetched from Github UI and it didn't end in merge conflict :sweat_smile:

Can you gave a look?

I will push a clone of this branch with the failing tests navlrac tests so you can also have look at the failing tests

BuhtigithuB avatar Oct 15 '21 23:10 BuhtigithuB

Here the branch with navlrac tests https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-rfc-7440-support-only-with-failling-tests

BuhtigithuB avatar Oct 16 '21 00:10 BuhtigithuB

Here you have all commits but 2 that seems dated : https://github.com/BuhtigithuB/tftpy/tree/navlrac-windowsize-all-cherry-picks

@msoulier : Waiting for your input to make a proper PR that could be merged

BuhtigithuB avatar Oct 16 '21 01:10 BuhtigithuB

UP!

BuhtigithuB avatar Oct 19 '21 14:10 BuhtigithuB

Let me know if I can help you any further...

BuhtigithuB avatar Oct 19 '21 16:10 BuhtigithuB

UP!

BuhtigithuB avatar Nov 03 '21 14:11 BuhtigithuB

@msoulier do you intent to include navlrac code into tftpy? Is it going to solve the instability issue identified with ubtoo?? Can we expect progress here? I am happy to help more if I can, but please give us some feedback...

BuhtigithuB avatar Nov 11 '21 17:11 BuhtigithuB

Frankly I need to set aside some time to reproduce this problem in a test case, and then evaluate the fix. I've been very busy.

I will try to find some time.

msoulier avatar Nov 12 '21 15:11 msoulier

I can help with testing and printout if I can... I will try the resolved merge conflict version with our setup if it helps fixing the instability issue that we have currently with tftpy and uboot...

Thanks for the feedback.

On Fri, Nov 12, 2021 at 10:07 AM Michael P. Soulier < @.***> wrote:

Frankly I need to set aside some time to reproduce this problem in a test case, and then evaluate the fix. I've been very busy.

I will try to find some time.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/msoulier/tftpy/issues/41#issuecomment-967186839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMVNRYSENCA5R5EDC4RCR3ULUUSXANCNFSM4ALXJNGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

BuhtigithuB avatar Nov 12 '21 21:11 BuhtigithuB

I need to take a little time to see if I can automate a test for this. I think we do something similarly already, so I think so.

Mike

On 2021-11-12 4:10 p.m., BuhtigithuB wrote:

I can help with testing and printout if I can... I will try the resolved merge conflict version with our setup if it helps fixing the instability issue that we have currently with tftpy and uboot...

Thanks for the feedback.

msoulier avatar Nov 16 '21 21:11 msoulier