gist icon indicating copy to clipboard operation
gist copied to clipboard

Does not seem to support gists with UTF8 content

Open sneak opened this issue 9 years ago • 15 comments

Trying to gist a file with unicode in it, throws error:

Error: "\xE2" on US-ASCII

There doesn't seem to be a commandline option for unicode support. :(

sneak avatar May 20 '15 20:05 sneak

utf-8 works fine for me: https://gist.github.com/ConradIrwin/d95a74f8a0606b190e3e

Are you trying to use a different encoding? If so it will almost certainly not work as the gist API requires valid JSON, which requires strings to be in UTF-8, and we don't do any transcoding.

ConradIrwin avatar May 26 '15 18:05 ConradIrwin

I do have the same issue. This file does give the error but it's UTF8 afaik. http://www.speedyshare.com/Rku3H/38e9a506/davdroid-2.log

Emil-V avatar Jun 27 '15 21:06 Emil-V

This appears to be able to reproduce the issue dependably:

$ echo "Héllö" > test
$ gist test
Error: "\xC3" on US-ASCII

vermiculus avatar Jul 04 '15 02:07 vermiculus

I think the problem is probably the computer's encoding being set to ascii.

ids1024 avatar Jul 22 '15 01:07 ids1024

@ids1024 Can you be more specific? Do you mean the shell's encoding? If so, I've also reproduced the issue with a file written in and confirmed UTF-8 with emacs and then just used gist itself.

vermiculus avatar Jul 22 '15 01:07 vermiculus

I mean locale: https://wiki.archlinux.org/index.php/Locale

If I am right, running the locale command should output values with ISO-8859-1 (i.e. ascii) in them. If the values include UTF-8, I am probably wrong.

ids1024 avatar Jul 22 '15 01:07 ids1024

@ids1024

sh-3.2$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Good guess though :/

vermiculus avatar Jul 22 '15 11:07 vermiculus

Possibly related issue: Was unable to post this gist here via gist: https://gist.github.com/tschottdorf/3913a87b72061b2f6f65

10:28 $ gist crashers/4ad21e7dcd127e00f23a5f46ab9736ee8af53f71.output
Error: Got Net::HTTPBadRequest from gist: {"message":"Problems parsing JSON","documentation_url":"https://developer.github.com/v3"}

tbg avatar Aug 04 '15 14:08 tbg

@tschottdorf Any conjectures as to why the two issues are related? (I don't doubt you, I'm just not seeing it.)

vermiculus avatar Aug 04 '15 15:08 vermiculus

Sorry, just checked the file. Posting the file from the online gist actually works. I suppose submitting it manually through the web interface cleaned up its encoding. There are some funny characters in the original (still visible, but probably changed in the URL), unfortunately I've canned that file since.

tbg avatar Aug 04 '15 16:08 tbg

$ gist --version
gist v5.0.0
$ ls
Report.gif  Report.htm  Report.txt
$ gist Report.gif
Error: "\xC8" on US-ASCII
$ locale
LANG=
LANGUAGE=
LC_ALL=
$ LC_ALL=en_GB.UTF-8 gist Report.gif
Error: source sequence is illegal/malformed utf-8
$ LC_ALL=C gist Report.gif
Error: "\xC8" on US-ASCII

Trace:

$ strace -f gist Report.gif
...
[pid   638] getcwd("/opt/results", 4096) = 13
[pid   638] open("/opt/results/Report.gif", O_RDONLY|O_CLOEXEC) = 7
[pid   638] ioctl(7, TCGETS, 0x7fffa95d84b0) = -1 ENOTTY (Inappropriate ioctl for device)
[pid   638] fstat(7, {st_mode=S_IFREG|0644, st_size=4254, ...}) = 0
[pid   638] lseek(7, 0, SEEK_CUR)       = 0
[pid   638] read(7, "GIF89a4\3\310\0\304\0\0\377\377\377\0\0\0\370\370\370\310\310\310\377\377\377\377\377\377\377"..., 4254) = 4254
[pid   638] read(7, "", 8192)           = 0
[pid   638] close(7)                    = 0
...
[pid   638] open("/usr/lib/x86_64-linux-gnu/ruby/2.3.0/enc/trans/single_byte.so", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7
[pid   638] fstat(7, {st_mode=S_IFREG|0644, st_size=112520, ...}) = 0
[pid   638] close(7)                    = 0
[pid   638] open("/usr/lib/x86_64-linux-gnu/ruby/2.3.0/enc/trans/single_byte.so", O_RDONLY|O_CLOEXEC) = 7
[pid   638] read(7, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220-\0\0\0\0\0\0"..., 832) = 832
[pid   638] fstat(7, {st_mode=S_IFREG|0644, st_size=112520, ...}) = 0
[pid   638] mmap(NULL, 2207760, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x7fb66bd64000
[pid   638] mprotect(0x7fb66bd7b000, 2093056, PROT_NONE) = 0
[pid   638] mmap(0x7fb66bf7a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x16000) = 0x7fb66bf7a000
[pid   638] close(7)                    = 0
[pid   638] mprotect(0x7fb66bf7a000, 20480, PROT_READ) = 0
[pid   638] write(1, "Error: \"\\xC8\" on US-ASCII", 25Error: "\xC8" on US-ASCII) = 25
[pid   638] write(1, "\n", 1
)           = 1

It works for text and HTML file, but not for the GIF file.

I'm attaching the file which fails: Report.gif

kenorb avatar Jul 12 '18 00:07 kenorb

The issue still exists in the latest version.

FliegendeWurst avatar Jan 29 '19 11:01 FliegendeWurst

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
$ gist -c -p README.md
https://gist.github.com/XXX
$ export LC_CTYPE=C
$ gist -c -p README.md
Error: "\xE6" on US-ASCII
$ export LC_ALL=en_US.UTF-8
$ gist -c -p README.md
https://gist.github.com/XXX

export LC_ALL=en_US.UTF-8 can solve this issue on macOS.

nella17 avatar Oct 27 '20 05:10 nella17

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
$ gist -c -p README.md
https://gist.github.com/XXX
$ export LC_CTYPE=C
$ gist -c -p README.md
Error: "\xE6" on US-ASCII
$ export LC_ALL=en_US.UTF-8
$ gist -c -p README.md
https://gist.github.com/XXX

export LC_ALL=en_US.UTF-8 can solve this issue on macOS.

work well on ubuntu18-04

668168 avatar Jan 26 '21 09:01 668168

Hitting same issue on Ubuntu-20.04:

> gist --version
gist v5.1.0
> locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
> gist jupyter-notebook.ipynb
Error: "\xCE" on US-ASCII

tbenst avatar Feb 05 '21 17:02 tbenst