tusd icon indicating copy to clipboard operation
tusd copied to clipboard

handler: support HTTP range requests in GET /:upload

Open markspolakovs opened this issue 2 years ago • 7 comments

Some discussion around API design in the quoted (since-closed) PR.

Out of interest, I did some searching and found #432 where you were against implementing Range support in tusd - have you changed your mind on this?

Yes, I did change my opinion here. We have recently been involved in the HTTP working group with the goal of making a IETF standard for resumable uploads. In the end, we hope that resumable uploads are one component in the HTTP toolbox that interacts well with the existing HTTP mechanism. Since then I am more interested in making sure that tusd works better with concepts, such as range requests.

In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.

Great idea, feel free to open an issue if you are still interested in working on this :)

Originally posted by @Acconut in https://github.com/tus/tusd/issues/1048#issuecomment-1881180105

markspolakovs avatar Jan 14 '24 10:01 markspolakovs

As mentioned in #1048, I prefer having explicit support for fetching ranged content inside the storage implementations, so that we can implement efficiently. My first idea was to add a new interface for storage, such as:

type Range struct {
  start int64
  end int64
}

type RangedDownloadsUpload interface {
  GetRangedReaders(ctx context.Context, ranges []Range) ([]io.ReadCloser, error)
}

The slices are necessary because the Range header can include multiple ranges.

However, after looking at the source code of Go's implementation for ranged GET requests (see https://cs.opensource.google/go/go/+/refs/tags/go1.21.6:src/net/http/fs.go;l=169-355), I am more hesitant with this approach. Go implements additional logic for preventing abuse and ensuring wide compatibility with even odd clients. If we go with my above proposal, we would have to duplicate this logic inside tusd as well, which I would like to avoid.

As an alternative, we could go with a more lazy route and basically pass the request and response into the storage to let it handle the ranged request as efficiently as possible:

type ServableUpload interface {
  ServeContent(ctx context.Context, w ResponseWriter, r *Request) (error)
}

The filestore could simply use http.ServeContent internally for this. The s3store (and similar cloud storages) could implement the interface by relaying the Range header to S3 and then just piping the response from S3 directly to the end-user's client. The logic for handling ranges is then outsourced to the cloud storage. A downside would be that this way, we also inherit the limitations of the cloud storages. For example, S3 does not support multiple ranges in one request. This limitation would then also be passed to the end-user. However, overall I like this approach for now.

Let me know what you think!

Acconut avatar Jan 26 '24 08:01 Acconut

KISS. I am unsure if multiple range requests are really common in a request, and I already had to deal with http.ServeContent myself and would rather not duplicate/have core code be borrowed.

If needed, revisit when someone asks, and then/or make a 3rd interface.

Let the backends deal with the problem, and don't try to be too smart/know better than the storage systems without a good reason.

pcfreak30 avatar Jan 26 '24 19:01 pcfreak30

Agreed. I am not sure when I will have time to work on this. If anybody is interested in starting this, let me know and I can help :)

Acconut avatar Jan 29 '24 06:01 Acconut

@Acconut im looking at your suggestion and would like to know how ServableUpload would connect to the Upload interface given it has a GetReader.

pcfreak30 avatar Oct 08 '24 00:10 pcfreak30

The data stores could implement an additional interface that turns a regular Upload into a ServableUpload, similar to how we already have TerminaterDataStore;

type ServableUpload interface {
  ServeContent(ctx context.Context, w ResponseWriter, r *Request) (error)
}

type ServerDataStore interface {
	AsServableUpload(upload Upload) ServableUpload
}

If a data store does not implement ServerDataStore, the handler can always fall back to the existing GetReader function. Does that make sense?

Acconut avatar Oct 08 '24 14:10 Acconut

The data stores could implement an additional interface that turns a regular Upload into a ServableUpload, similar to how we already have TerminaterDataStore;

type ServableUpload interface {
  ServeContent(ctx context.Context, w ResponseWriter, r *Request) (error)
}

type ServerDataStore interface {
	AsServableUpload(upload Upload) ServableUpload
}

If a data store does not implement ServerDataStore, the handler can always fall back to the existing GetReader function. Does that make sense?

Do we then want

func (store *StoreComposer) UseServableUpload(ext ServableUpload) {
	store.UsesServableUpload = ext != nil
	store.ServableUpload = ext
}

as well?

then call it conditionally like the others?

pcfreak30 avatar Oct 08 '24 23:10 pcfreak30

Yes, the composer needs to be adjusted as well, but take the data store interface instead of the additional interface for the upload:

func (store *StoreComposer) UseServer(ext ServerDataStore) {
	store.UsesServer = ext != nil
	store.Server = ext
}

(although we would have to think about the use of "server". While it's custom in Go to use -er suffix for interfaces, the use of "server" can easily be misunderstood here)

Acconut avatar Oct 09 '24 07:10 Acconut

Support for ranged GET requests in filestore and s3store has just landed.

Acconut avatar Dec 20 '24 12:12 Acconut