gitoxide icon indicating copy to clipboard operation
gitoxide copied to clipboard

Avoid blocking forever when fetching from misbehaving servers

Open kim opened this issue 3 years ago • 5 comments

Following up a conversation elsewhere, the following situation can legitimately occur:

11:18:08.860027 pkt-line.c:80           packet:  upload-pack> version 2
11:18:08.860139 pkt-line.c:80           packet:  upload-pack> agent=git/2.20.1
11:18:08.860154 pkt-line.c:80           packet:  upload-pack> ls-refs
11:18:08.860483 pkt-line.c:80           packet:  upload-pack> fetch=shallow ref-in-want
11:18:08.860504 pkt-line.c:80           packet:  upload-pack> server-option
11:18:08.860514 pkt-line.c:80           packet:  upload-pack> 0000
11:18:08.867555 pkt-line.c:80           packet:  upload-pack< command=ls-refs
11:18:08.867581 pkt-line.c:80           packet:  upload-pack< agent=git/oxide-0.8.0
11:18:08.867696 pkt-line.c:80           packet:  upload-pack< 0001
11:18:08.867731 pkt-line.c:80           packet:  upload-pack< symrefs
11:18:08.867751 pkt-line.c:80           packet:  upload-pack< peel
11:18:08.867764 pkt-line.c:80           packet:  upload-pack< ref-prefix refs/heads/
11:18:08.867777 pkt-line.c:80           packet:  upload-pack< ref-prefix refs/pulls/
11:18:08.867916 pkt-line.c:80           packet:  upload-pack< 0000
11:18:08.868519 pkt-line.c:80           packet:  upload-pack> 0000
11:18:08.871748 pkt-line.c:80           packet:  upload-pack< command=fetch
11:18:08.871784 pkt-line.c:80           packet:  upload-pack< agent=git/oxide-0.8.0
11:18:08.871796 pkt-line.c:80           packet:  upload-pack< 0001
11:18:08.871839 pkt-line.c:80           packet:  upload-pack< thin-pack
11:18:08.871851 pkt-line.c:80           packet:  upload-pack< include-tag
11:18:08.871863 pkt-line.c:80           packet:  upload-pack< ofs-delta
11:18:08.871874 pkt-line.c:80           packet:  upload-pack< done
11:18:08.871881 pkt-line.c:80           packet:  upload-pack< 0000

Here, the client did not send any wants or haves, and the want-refs did not match any refs on the remote side (due to a bug in git, but this could happen legitimately). The server thus determines that no packfile can be built from this information, and sends done. A packfile section is never sent.

The fetch function in this case blocks indefinitely, waiting for negotiation messages or a packfile. Instead, it should consider the done, break the loop, and terminate the conversation by sending a flush packet (as per normal behaviour).

kim avatar Jul 26 '21 21:07 kim

Edit: ref-prefix, not want-ref.

In the example, one might think that the client could have noticed that it is about to request an empty pack (and that’s probably true). However, in case it uses want-ref under a wrong assumption of what the remote actually has, this situation could occur without a way for the client to recover.

kim avatar Jul 26 '21 21:07 kim

Oh wait, the server is not actually sending anything, that’s the client’s done. I wonder if that should be considered a bug in upload-pack.

kim avatar Jul 26 '21 21:07 kim

Thanks a lot for the investigation. Even though in this case I also believe that the server is clearly misbehaving due to its failure to respond to a command, I also think that this is something that the transport layer should optionally guard against in async mode.

On the transport layer, progress messages can be received and translated into the Progress trait even though the client doesn't receive any data bytes (e.g. during the possibly long counting stage of the servers pack generation). Thus any timeouts applied to the entire operation or on protocol level may trigger even though the server is fine, but busy.

Instead there should be a way to have a timeout each time one tries to read a new packet line, and while impossible to implement in the blocking client, the async client should be able to do that. This reader implementation would be the one to be wrapped in some other reader that supports a timeout. Such a wrapper could be applied here to affect all future read operations.

In order to pass that through, the git transport could obtain another 'builder method' to allow taking a function that takes a boxed reader and returns another boxed reader which is supposedly wrapped into something with timeouts. For completeness, a default implementation could be provided that uses a lightweight timer implementation and maybe even abstracts over some sort of timer future because ultimately it's really just a low-level implementation of a 'race' between two futures, specialised to work within a particular trait.

As a side-note, being able to do this is a testament to the usefulness of async even in the client, which thus far I was ignoring. After all, I wouldn't know how to reasonably implement this for the blocking client for example. Once this feature has landed, all but the small targets of gitoxide are likely to be switched to the async client to leverage the added layer of protection. To my knowledge, canonical git may implement timeouts, but does so usually based on process-level timers which aren't aware of individual packet lines. Maybe they reset the timer each time a packet line is received though. Since gitoxide crates are libraries, any mechanism to bring down the entire process wouldn't be feasible though, leaving the blocking implementation with option equivalent to what the async implementation can accomplish.

Byron avatar Jul 27 '21 00:07 Byron

Yes timeouts on the receive channel would be a good idea in general, although I'm not sure offhand how much this would have to be protocol-aware. A well-behaved server would send an empty sideband packet when busy for too long, so "no data within period" might actually be good enough.

For posterity, the described issue is indeed a client bug: requesting a non-existing ref via want-ref results in an error response, so the client can know if it has requested anything or not. Curiously, the reason upload-pack simply ignores a fetch command without any wants nor want-refs (and moves on to wait for the next command) seems to be related to the wait-for-done feature. The utility of which is to be able to perform negotiation without actually asking the server to build a packfile at the end. Iow, git fetch --negotiate-only (available in git 2.32) will only send haves and look at what the server ACKs, but never send done (which the server is asked to wait for). TIL.

kim avatar Jul 27 '21 06:07 kim

A well-behaved server would send an empty sideband packet when busy for too long, so "no data within period" might actually be good enough.

That's great news - from my point of view having a timeout mechanism for receiving data lines (on any channel) would be enough to properly implement timeouts. This in conjunction with a global timeout set on a higher level might do the trick.

requesting a non-existing ref via want-ref results in an error response, so the client can know if it has requested anything or not.

gitoxide automatically transforms ERR packet lines into std::io::Error when reading so I would expect this to be handled correctly.

Admittedly everything else regarding the server behaviour as well as the intricacies of wait-for-done and negotiate-only are a bit beyond me at the moment. Maybe one day it will be more sense once especially once I delved into the server side more by implementing it.

Byron avatar Jul 28 '21 03:07 Byron