pyth icon indicating copy to clipboard operation
pyth copied to clipboard

Can’t safely put NUL or CR bytes inside a double-quoted string

Open andersk opened this issue 8 years ago • 4 comments

Inside a double-quoted string, Pyth translates CR (\r) to LF (\n). NUL bytes (\000) seem to work unless followed by a digit 0–7, because Pyth translates them to \0 instead of \000.

$ printf '"\r"' | xxd
00000000: 220d 22                                  "."
$ printf '"\r"' | pyth -d /dev/stdin
==================== 3 chars =====================
"
"
==================================================
imp_print("\n")
==================================================

$ printf '"\00012"' | xxd
00000000: 2200 3132 22                             ".12"
$ printf '"\00012"' | pyth -d /dev/stdin 
==================== 5 chars =====================
"12"
==================================================
imp_print("\012")
==================================================

andersk avatar Apr 05 '16 06:04 andersk

I've fixed the null byte issue, but the CR issue seems to be introduced by Python. I'll need to investigate more for that one.

isaacg1 avatar Apr 05 '16 07:04 isaacg1

If you replace open(file_or_string, encoding='iso-8859-1') with open(file_or_string, encoding='iso-8859-1', newline=''), then Python will stop translating \r and \r\n to \n. Of course, you may then need to teach Pyth to keep accepting \r and \r\n in various other places where newlines are significant, to keep Mac and Windows users happy.

(It may be cleaner, but more work, to open in binary mode and use bytes everywhere?)

andersk avatar Apr 05 '16 17:04 andersk

\r hasn't been used on Mac for a while now

vendethiel avatar Apr 05 '16 18:04 vendethiel

There are similar issues with \ followed by NUL or LF or CR.

\␀imp_print("␀")ValueError: source code string cannot contain null bytes

\␊ or \␍IndexError: string index out of range

andersk avatar Aug 02 '16 02:08 andersk