tds icon indicating copy to clipboard operation
tds copied to clipboard

Complete content retrieval on HTTP HEAD requests puts an unnecessary burden on tds

Open cskarby opened this issue 4 years ago • 6 comments

According to https://tools.ietf.org/html/rfc7231#section-4.3.2 payload headers are optional for HTTP HEAD responses.

Content-Length is a payload header according to https://tools.ietf.org/html/rfc7231#section-3.3

In https://github.com/Unidata/tds/blob/987a79bf15192c52cc3fc815273f36fed9a93fc4/tds/src/main/java/thredds/servlet/filter/HttpHeadFilter.java#L41-L45 a complete GET-request is processed to compute the Content-Length, and the HTTP body is discarded. This seems like a waste of resources, especially for large datasets, possibly spanning several files (e.g. via ncml aggregates). I think it is better to handle this explicitly by having a pair of functions: one to set the http headers (except for payload headers), and call this function from get functions, this way we can give swift responses back on HTTP HEAD requests (and save resources on the server side.)

cskarby avatar Mar 24 '21 14:03 cskarby

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

lesserwhirls avatar Mar 24 '21 19:03 lesserwhirls

@ethanrd - what do you think?

lesserwhirls avatar Mar 24 '21 19:03 lesserwhirls

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

I agree, but file size should probably come from the filesystem rather than opening the file and do a complete stream from disk to memory to count the bytes.

cskarby avatar Mar 25 '21 13:03 cskarby

Indeed. I would grab file size from the File object for the HTTPServer service. The option to turn off Content-Length for the other services for HEAD requests is what I would target, since you can really only know those sizes by processing the request first.

lesserwhirls avatar Mar 25 '21 16:03 lesserwhirls

Hi @cskarby - Looks like all TDS services except HTTPServer use HttpHeadFilter (according to applicationContext.xml). For HTTPServer, both GET and HEAD requests are handled by the same method because of Spring MVC defaults rather than the filter. HTTPServer does the right thing, getting size and last modified from the File object and, if it is a HEAD request, finishes without reading any bytes.

We could look at moving in a similar direction for the other TDS services (and make inclusion of Content-Length configurable). I don’t think there’s an across the board change to make this switch. It would take some work (though maybe not a lot) on each service to tease HEAD and GET apart.

Are you seeing performance issues with HEAD requests on particular TDS services?

ethanrd avatar Mar 29 '21 05:03 ethanrd

I think at the very least we can make it configurable. As you mention, there is a cost based on the size of the backing datasets, but the cost will vary based on service type as well. For example, the HTTPServer service (/thredds/fileServer/*) will only be accessing a single file, and returning the size of that file for HEAD requests is not too bad.

Keep in mind that THREDDS now supports cloud file hosting, such as CDMS3 and CDMRemote. Calculating the file size is not always as simple as asking the file system.

gaellafond avatar Jul 03 '23 03:07 gaellafond