giftless icon indicating copy to clipboard operation
giftless copied to clipboard

Improve file name sanitization logic used by streaming transfer adapter

Open shevron opened this issue 4 years ago • 0 comments

Now that we have merged https://github.com/datopian/giftless/pull/68, the streaming transfer adapter allows passing the content-disposition filename value as a parameter. However, the filename sanitization logic in https://github.com/datopian/giftless/blob/master/giftless/util.py#L73..L84 is very strict, and only allows latin alphanumerics, dashes, underscores and dots.

While this is a good security measure (we don't want any special characters injected into the HTTP headers), this is restrictively strict. In addition, many users would want file names with international, non-latin (Hebrew, Arabic, Chinese, European umlauts and accents, Cyrillic etc. etc.) characters.

There is really no reason to avoid any special character other than characters that could affect HTTP headers, and even in this case we may be safe depending on Flask / Werkzeug's handling of headers.

Specifically, I think we should avoid / escape anything that is non-printing, semicolons, double quotes and new-lines. Other than that, we should be fine.

Perhaps it would be better to escape rather than strip these characters to ensure we never send an empty filename - for example URL-encoding only a handful of "unsafe" characters could be a good solution here.

shevron avatar Feb 24 '21 11:02 shevron