tus-resumable-upload-protocol
tus-resumable-upload-protocol copied to clipboard
Standardize support for streaming uploads
As @felixge pointed out in #26, it would be good to have a standardized way of providing URL endpoints where a client can retrieve a file that is currently being uploaded that will stay open until the entire file has been sent.
Following the decision in #26 to replace Offset with Content-Length, clients will by default be getting only the bytes that have been uploaded at the time of the request. A conforming client might be able to detect the Entity-Length header and keep the connection open to stream more bytes, but it would be good to define the protocol in such a way that "normal" HTTP clients would be able to request a file being uploaded and receive the entire file too.
One way of achieving this might be to change the default behavior of HEAD and GET requests to by default serve Content-Length = Entity-Length and stream the file to the client, but add a request flag a client to send if they wish to only get the uploaded bytes and not wait for the rest. Something like Accept: incomplete, except with a more appropriate header field (Accept is only for content types).
+1 to use standard byte range requests. Linking to @felixge's gist
@vayam that wouldn't allow streaming a download to tus-unaware clients though...
@Jonhoo I see your point.
I prefer 'Entity-*' . How about Entity-Receive: available
Here is one attempt to describe the flow. Let me know what you guys think.
Upload Client
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Enity-Receive: available
Response:
HTTP/1.1 200 Ok
Content-Length: 70
For clients interested in downloading whatever is available:
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Enity-Receive: available
Response:
HTTP/1.1 200 Ok
Content-Length: 70
bytes
Download client (Standard HTTP client - browser/curl)
waits until the file is downloaded
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Response:
HTTP/1.1 200 Ok
Content-Length: 100
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Response:
HTTP/1.1 200 Ok
Content-Length: 100
bytes
Advanced Downloader
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Response
HTTP/1.1 200 Ok
Accept-Ranges: bytes
Content-Length: 100
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-
Response:
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 100
Content-Range: bytes 0-99/100
bytes
>> Connection dropped after receiving 70 bytes
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=70-
Response:
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 100
Content-Range: bytes 70-99/100
bytes
Advanced Downloader to receive available bytes
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Entity-Receive: available
Response:
HTTP/1.1 200 Ok
Accept-Ranges: bytes
Content-Length: 70
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Entity-Receive: available
Range bytes=0-
Response:
HTTP/1.1 200 Ok
Content-Range: bytes 0-70/70
Content-Length: 70
bytes
The flows you indicate correspond mostly to the kind of flow I had in mind too. Some points though:
- Surely
Accept-Ranges: byteswould always need to be present as the server does not know if the client might check for it or not? - For the
Connection dropped after receiving 70 bytes, shouldn'tContent-Rangebe 0-69/100, not 0-99/100? And for that use-case I'd sayContent-Lengthshould be 70, not 100, no? Entity-Receive: availabledoesn't seem entirely intuitive to me either as it could be read as "Entity-Receiveis available" rather than "Receive the part of the entity that is available". I'm not sure any special headers apart fromRange(and maybeContent-Range) are needed...
How about something like this?
# Incomplete file
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
HTTP/1.1 200 Ok
Content-Length: 70
Entity-Length: 100
Accept-Ranges: bytes
# Get only bytes the server has
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-69
HTTP/1.1 200 OK
# Not a 206 because it's not a partial reply considering
# the user is only asking for bytes we have
Content-Length: 70
Content-Range: bytes 0-69/70
bytes
# Stream (and what will happen to regular curl-like clients)
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
HTTP/1.1 200 OK
Content-Length: 100
bytes
Surely Accept-Ranges: bytes would always need to be present as the server does not know if the client might check for it or not?
Accept-Ranges is optional for server to implement. I should have probably mentioned it is an extension.
For the Connection dropped after receiving 70 bytes, shouldn't Content-Range be 0-69/100, not 0-99/100? And for that use-case I'd say Content-Length should be 70, not 100, no?
Even range based requests should work same as GET. because, standard video players would do range based requests, if server supports it
Entity-Receive: available doesn't seem entirely intuitive to me either as it could be read as "Entity-Receive is available" rather than "Receive the part of the entity that is available".
Agreed. No Custom Request Headers
How about something like this? Incomplete file HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1 Host: tus.example.org
HTTP/1.1 200 Ok Content-Length: 70 Entity-Length: 100 Accept-Ranges: bytes
Shouldn't it be
HTTP/1.1 200 Ok Content-Length: 100 Entity-Received: 70 <- or Any better name to indicates actual bytes received
Get only bytes the server has GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1 Host: tus.example.org Range: bytes=0-69
Yes that would work on a server that supports Range based requests.
HTTP/1.1 200 OK Not a 206 because it's not a partial reply considering the user is only asking for bytes we have
Not true. All standard implementations - Akamai, S3 return 206 for Range: bytes=0- even if they are sending all data. You can check with any standard HTML5 video player.
It would be nice to have GET with range and without to be consistent. Because if the same url is passed to video player, it will do range requests if server supports it.
Eg: <video src="http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216">
For the video to play it has to return Content-Length = Entity-Length
If server supports range requests and you know how many bytes were actually received, you can do byte range
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-69
Response
HTTP/1.1 200 Ok
Content-Range: bytes 0-69/100
Content-Length: 70
@felixge @Jonhoo The more I think all we need is a better name for Offset to indicate actual bytes received.
We can recommend server support standard HTTP 1.1 Range requests to allow partial download and video player seek support.
Shouldn't it be HTTP/1.1 200 Ok Content-Length: 100 Entity-Received: 70 <- or Any better name to indicates actual bytes received
No, as we discussed in #26, Content-Length should indicate the size of the content present in the body of the reply (or for HEAD, the length of the content that would have been present in the body of the reply for the corresponding GET). Entity-Length is a made up header that should indicate the size of the "real" entity once it has finished uploading.
I do see your point that now HEAD and GET return different values, so perhaps swapping them around might be appropriate. That is, say that Entity-Length is always the length of the body of the response and Content-Length is the length of the full object...
Not true. All standard implementations - Akamai, S3 return 206 for Range: bytes=0- even if they are sending all data.
Ah, ok, I wasn't aware. Fair enough - 206 it is then. I'm not sure I agree with that interpretation of the standard, but in this case it might be better to follow the de facto standard.
For the video to play it has to return
Content-Length = Entity-Length
Are you sure about this?
If server supports range requests and you know how many bytes were actually received, you can do byte range
That example there seems good to me. Making the last number of Content-Range be the total size of the entity is a good way of doing it. I think, for consistency, we might want to add in Entity-Length there as well to make it consistent with HEAD and GET without byte ranges
I do see your point that now HEAD and GET return different values, so perhaps swapping them around might be appropriate. That is, say that Entity-Length is always the length of the body of the response and Content-Length is the length of the full object...
Entity-Length already means length of full object
Either we have Entity-Received or something equivalent or keep Offset as is.
For the video to play it has to return Content-Length = Entity-Length
Are you sure about this?
Yes
Ah, sorry, I misread Entity-Received as Entity-Length. That makes much more sense now. Yes, I think that might be a good way of doing it. Essentially this means that we're always sending Content-Length as the size of the full entity, so I suppose we could actually get rid of Entity-Length altogether?
The end result of doing it that way would be that the default will always be to download the full file even if that means waiting for the server to receive all the bits. Services that want only the available bits would then use a Range request to get only those bytes based on the value of Entity-Received. That seems a good solution to me. @felixge ?
The reason we have Entity-Length is we have to send Content-Length: 0 during file creation
Request:
POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Entity-Length: 100
Response:
HTTP/1.1 201 Created
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216
Okay, fair enough, but it doesn't seem to be needed for anything download related?
Yes that is correct.
Ok, to summarize:
- rename:
Final-Length->Entity-Length - keep:
Offsetfor PATCH requests and HEAD/GET responses (do we keep the name?) Content-Lengthfor GET/HEAD is alwaysEntity-Length
Did I miss anything?
I still think Offset in HEAD/GET responses is misleading. Something like Entity-Received as suggested by @vayam above seems more appropriate. For PATCH, Offset makes sense, but wouldn't Range be more appropriate so we don't have to come up with our own specialized header?
Also, the bits about being able to request only certain ranges of a file (that is, bits that the server already has) should probably be mentioned in the spec?
Apart from that I think you have everything.
Ok, to summarize: rename: Final-Length -> Entity-Length
Yes
keep: Offset for PATCH requests and HEAD/GET responses (do we keep the name?)
Not sure. Not entirely convinced with Entity-Received Header. Unless you can come up with a better one
Content-Length for GET/HEAD is always Entity-Length
Yes
Did I miss anything?
Nope
@Jonhoo
For PATCH, Offset makes sense, but wouldn't Range be more appropriate so we don't have to come up with our own specialized header?
Can you come up with a valid byte range request where in server responds back with "bytes received so far", without breaking our set assumption Content-Length=Entity-Length? Because I couldnt come up with one.
@vayam not sure I understand your question? The server already includes Entity-Received (or something like it) in the HEAD, so the client knows how much data the server has. Couldn't it then use Range: 70-/100 in the PATCH request to indicate that it's uploading bytes 70 and onwards?
The reasoning behind Offset
Related #2 and @felixge's summary
Ah, fair enough. If Range is standardized to only be meaningful to GET and Offset is on the track for becoming a standard then I'd say go ahead with Offset for PATCH - it makes more intuitive sense than Range anyway.
I still believe Offset is the wrong header to use for GET/HEAD though, as we're not serving data from an offset in those requests, rather we're only serving data up to a certain point in the stream. Here a header like Entity-Received seems much more correct.
@Jonhoo I agree Entity-Received is more intuitive for GET and HEAD
@felixge What are your thoughts on this?
How about Entity-Available over Entity-Received?
Sounds good to me. We should clarify that it is the bytes received or available at the time HEAD/GET request was made.
@Jonhoo what's the benefit of Entitiy-Available / Entitiy-Received over the Offset header for HEAD/GET responses? Maybe it's just me, but I find both names to be more confusing.
In my opinion, Offset doesn't make sense as a response to a HEAD/GET as the value describes the amount of content available, not an offset into that content.
I did some experiments with tusd and brewtus node server. It is not trivial to implement Content-Length == Entity-Length for GET request while upload is in progress. To implement a smooth streaming of url, the GET request should throttle to current upload speed. This can potentially lead bugging GET implmentations. A simple GET implementation like this and this would cause timeouts
Here is my test
Upload to tus.io demo site using tuspy
python tuspy.py -f ~/vayam-dev/Largefile.mov
http://master.tus.io/files/d949a91388ef1d7ab4e74d5203f57ebd
{'content-length': '0', 'access-control-allow-methods': 'HEAD,GET,PUT,POST,PATCH,DELETE', 'access-control-expose-headers': 'Location, Range, Content-Disposition, Offset', 'date': 'Mon, 10 Jun 2013 03:47:19 GMT', 'access-control-allow-origin': '*', 'access-control-allow-headers': 'Origin, X-Requested-With, Content-Type, Accept, Content-Disposition, Final-Length, Offset', 'offset': '0'}
Now issue a download while upload is in progress.
curl -v "http://master.tus.io/files/d949a91388ef1d7ab4e74d5203f57ebd" > /dev/null
* About to connect() to master.tus.io port 80 (#0)
* Trying 54.235.134.243...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* connected
* Connected to master.tus.io (54.235.134.243) port 80 (#0)
> GET /files/d949a91388ef1d7ab4e74d5203f57ebd HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: master.tus.io
> Accept: */*
>
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Content-Disposition, Final-Length, Offset
< Access-Control-Allow-Methods: HEAD,GET,PUT,POST,PATCH,DELETE
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Location, Range, Content-Disposition, Offset
< Content-Length: 4058518957
< Date: Mon, 10 Jun 2013 03:47:49 GMT
< Offset: 0
< Content-Type: application/octet-stream
<
{ [data not shown]
0 3870M 0 6159k 0 0 487k 0 2:15:22 0:00:12 2:15:10 842k* transfer closed with 4051146053 bytes remaining to read
0 3870M 0 7200k 0 0 542k 0 2:01:39 0:00:13 2:01:26 839k
* Closing connection #0
curl: (18) transfer closed with 4051146053 bytes remaining to read <------------------Request times out
@felixge, @Jonhoo I suggest we keep it simple. Make Content-Length == Offset and Add Entity-Length header to GET and HEAD like we discussed in #26
@vayam in a way the timeout you show above has the same end result as having Content-Length == Offset - you only download as much of the file as available at the time. Keeping it the way we've discussed thus has the same behavior for legacy clients, and it allows a server to implement streaming downloads if it so chooses?
@Jonhoo the issue is Offset at the time response would be less the actual content downloaded. The download size would depend on upload and download speeds.
@felixge, how about we get rid of Offset in response header keep Content-Length == Entity-Length by default.
Add a new header Streaming: off for getting the bytes received by server.
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Streaming: off
Response:
HTTP/1.1 200 Ok
Content-Length: 70
By default
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Streaming: on <--- Optional / on by default
Response:
HTTP/1.1 200 Ok
Content-Length: 100
@felixge, @Jonhoo can we discuss this on irc. i am usually available morning EST
Wouldn't it then make more sense as we decided above to include Entity-Available in the response - that way conforming clients could decide what behavior they want based on whether only parts of the file are available.
They could either choose to do a streaming download by requesting the whole file and not timing out, or they could choose to do a non-streaming download by only downloading bytes 0-70 using Range. Legacy clients will time out if the upload is not faster than their download speed, but I'm not sure this is really unexpected behavior?
I have been in two minds about this. Both have advantages and disadvantages. For now, I will throttle GET requests to keep my downloads smooth while upload is in progress.
@vayam sorry for the lack of activity on the project lately. A few things have changed on my end, which means I won't be able to continue with the project for a while : /. Is there any chance you might be interested in taking over the project? If so @kvz and @tim-kos would be happy to help with anything you might need.
@felixge you started something awesome here. I would love to take it forward. Let us talk over skype (vnarenv) or on irc(vayam) about the details. Let me know what times work for you. Excited about working with @kvz and @tim-kos.