Incorrect Content-type header for StreamingStorage factory
This is a bug/new feature request in which Giftless is used with CKAN, ckanext-blob-storage and datapusher, see original discussion.
When using StreamingStorage (factory: giftless.transfer.basic_external:factory) the Content-type of a file to be stored is not preserved. For instance, after uploading a csv-file and then downloaded its Content-type returned by Giftless is text/html which causes an error at datapusher thus resulting that tabular data cannot be previewed in CKAN UI. ExternalStorage seems to work correctly in this sense.
Thanks for reporting, I think this would be useful to implement as some clients depend on the correct content type. I don't consider this a bug per se because handling content type properly is not a part of the Git LFS spec, but more of a Giftless-specific extension / capability that is currently only supported by some backends / transfer modes.
Some thoughts on this (kind of a brain dump rather than something organized):
- Supporting this in
ExternalStoragedepends on the storage backend, because it is the backend that sends the HTTP reply. I know we implemented support for content-type preservation / pre-setting in Azure / GCP but not all backends may support this. - Supporting this in
StreamingStorageis a bit different, because there could be different strategies to pass the Content-type header from the storage backend (if preserved / saved) or pre-set it when crafting upload / download URLs for example, like we do with JWT tokens and handlingx-filename. - Saving the content-type when LocalStorage is used will be harder - we will need to save it on disk somewhere. Or we can just rely on the file name or mime-magic type DB to guess the content-type and fall back to some default (
application/octet-stream) - There was some discussion about this here: #68
- It may be easiest (and will solve the CKAN use case) if we just pass-through a value that is passed as a query param, then fall back to guess based on file extension (Python built-in capability ... I don't remember the standard library module that does this) then fall back to a default such as
application/octet-stream.