bindle icon indicating copy to clipboard operation
bindle copied to clipboard

Creating/uploading invoices and parcels (what to do when parcels are not all sent)

Open technosophos opened this issue 4 years ago • 3 comments

In the very first design of Bindle, the transport layer was to be http/3 and the invoice and parcels were to be streamed concurrently. When the last parcel was uploaded, then the invoice response 200 was to be sent.

Early in Bindle's development, we backed off HTTP/3 -- largely b/c libraries were not stable. So in the short term, what we did was simply have the create invoice endpoint return 2xx as soon as the invoice was uploaded. The response code would inform the client as to whether it needed to send the parcels (in which case the body would list the parcels) or whether the invoice was already done.

We accepted the state that the invoice could be pushed, and consequently served, even if the Bindle server did not actually have all of the parcels to fulfill the invoice. And in that original design, the parcels could be fetched from other sources (e.g. we could proxy through to an upstream server) because parcels could be fetched directly.

Later, we changed the parcel service so that a parcel could only be fetched as part of an invoice. That closes off the ability of a client to go to another server and ask for a parcel, unless that other server also has the same invoice.

So at this point, if a client does not push all of the necessary parcels, another client can fetch the invoice, but not be able to fetch the related parcels. And it currently has no way to attempt to backfill parcels from other sources. This is not a desirable state.

So we have the following options:

  1. Refuse to serve the invoice until all of its parcels are present on the server.
  • Pro: This is fairly simple on the protocol side.
  • Con: This requires that the server be able to compute when it can serve an invoice. And this could be a heavy IO operation
  1. Change the protocol to require that all of the parcels be submitted before the invoice can be accepted
  • Con: While this might sound tenable up front, it is actually a fairly major change to the security model. The invoice is used to determine which parcels the client is allowed to send to the server. We want to prevent clients from "hiding" parcels on the server
  1. Change the protocol to require that the client re-notify the server when it believes it has sent all of the parcels
  • The security implications of this are not totally clear.
  • Pro: It makes the client-side logic relatively straightforward: Submit invoice, send parcels, send notifcation
  • Pro: The server checks for conformance only when the client requests it
  • Con: If a client fails to make the notification, an invoice can be complete, but not marked complete
  1. Re-write using HTTP/3 the way we initially intended to do it
  • Pro: It would work the way we intended
  • Pro: It would likely improve performance
  • Con: It would be a compat-breaking change to all Bindle clients, and HTTP/3 is not yet broadly supported

technosophos avatar Jul 29 '21 02:07 technosophos

This might not be a viable option, but what if you modified 3 so that if a consumer requested an invoice that was not yet marked complete, the server checked if all parcels were ready to serve, and if so marked the invoice complete (as in 1). This avoids the 'invoice complete but not marked complete' issue. However, it introduces a potential denial of service by having clients repeatedly request an incomplete invoice (this forcing repeated expensive checks). I'm not sure if there's a mitigation for that which is easy enough for third-party Bindle implementations to get right.

Another thought: The spec could do the change in 3 and say "Thou shalt not serve an incomplete invoice," but leave it up to server implementations if they wanted to keep things simple and mark invoices complete only when notified, or whether they to "auto-completion" as a convenience to recover from the dropped notification scenario. But that would result in different behaviours across different servers, meaning that a sloppy client might get away with it against Implementation A but then fail against Implementation B.

In conclusion, it has taken me two paragraphs to establish that all my ideas are terrible and I should never have written them down. Thank you and goodnight.

itowlson avatar Jul 29 '21 03:07 itowlson

One thing to point out here: Right now if a parcel does exist (because it already exists from a different bindle), then the new bindle will reuse it. That doesn't solve the "you could go fetch it from upstream servers" issue, but it does make this a little less gnarly

thomastaylor312 avatar Jul 29 '21 17:07 thomastaylor312

Another possibility is that the server recomputes the completeness of an invoice when any parcel is uploaded. However, this could go amiss if two bindles are each missing the same blob: the blob is uploaded via invoice A, so the server can mark A complete; but it won't know to also mark B complete. So this could result in the same "invoice complete but can't be served" issue as the explicit notification strategy.

itowlson avatar Dec 14 '21 01:12 itowlson