Feature request: Support converting zero bytes and high-bit-set bytes to octal sequences for VARBINARY
PostgreSQL has a function encode(data, 'escape') where data is a bytea. https://www.postgresql.org/docs/9.4/functions-binarystring.html
Currently when using athencli w/ the Athena JDBC the default output for VARBINARY is to convert to UTF8 and if unable to convert it just outputs the hex representation of the VARBINARY. It would be extremely useful to have the ability to support this output mode directly in athenacli for any VARBINARY column when set.
Here is an example select on a VARBINARY column from Athena when using the Python pyathenajdbc library which is a wrapper for the Athena JAR. Notice how the ASCII converts but other other bytes are left as-is:
b'\x03\x00\x00+&\xe0\x00\x00\x00\x00\x00Cookie: mstshash=hello\r\n\x01\x00\x08\x00\x03\x00\x00\x00'
@Obsecurus Thanks for your feedback.
From what I understood, the database engine behind Athena is Presto, could you use Presto's binary function to encode the binary data?
https://prestodb.io/docs/0.217/functions/binary.html
you can encode to utf-8 but then this loses all non-ASCII characters. i think anyone using athenacli and working with VARBINARY would want this sort of functionality. So essentially athenacli is receiving the correct bytes it's all just in how they are finally displayed. Ideally as utf-8 and hex escaped for things that won't be encode.
So the same query in athenacli using the from_utf8() PrestoDB function for example yields:
^C^@^@/*�^@^@^@^@^@Cookie: mstshash=Administr\r\n^A^@^H^@^C^@^@^@
@zzl0 did you have any ideas how we might approach this? We could contribute I just haven't done any dev against athenacli yet.