gone
gone copied to clipboard
Detect and handle charsets
Currently, not all human-readable (i.e. non-binary) files can be edited. This is because
- our heuristics for detecting editable mime types in the
http/editor
package is quite bad, and - we have no mechanism for converting from/to utf-8, which could introduce some encoding problems with browsers.
As propsed in #17, we should use github.com/saintfish/chardet to detect the charsets and to also detect whether a file is editable or not.
Furthermore, we can take the detected charset's IANA identifier and use the golang.org/x/text/encoding packages (together with its ianaindex) to decode the file to UTF-8 before displaying it. When saving it, we could
- just save it as UTF-8, or
- detect the files content type again and encode back to that type.
The first method might have the drawback of errors during misdetection of content types -- here, we should make use of the fact that chardet
provides a propability for the content type detection. If the certainty is not too high, we should store the file as UTF-8. Also, all this should be disableable from the configuration.