gone icon indicating copy to clipboard operation
gone copied to clipboard

Detect and handle charsets

Open fxnn opened this issue 8 years ago • 0 comments

Currently, not all human-readable (i.e. non-binary) files can be edited. This is because

  • our heuristics for detecting editable mime types in the http/editor package is quite bad, and
  • we have no mechanism for converting from/to utf-8, which could introduce some encoding problems with browsers.

As propsed in #17, we should use github.com/saintfish/chardet to detect the charsets and to also detect whether a file is editable or not.

Furthermore, we can take the detected charset's IANA identifier and use the golang.org/x/text/encoding packages (together with its ianaindex) to decode the file to UTF-8 before displaying it. When saving it, we could

  • just save it as UTF-8, or
  • detect the files content type again and encode back to that type.

The first method might have the drawback of errors during misdetection of content types -- here, we should make use of the fact that chardet provides a propability for the content type detection. If the certainty is not too high, we should store the file as UTF-8. Also, all this should be disableable from the configuration.

fxnn avatar Jan 02 '16 18:01 fxnn