hue icon indicating copy to clipboard operation
hue copied to clipboard

Display and accept binary HBase data in escaped form

Open stoty opened this issue 1 year ago • 5 comments

Description

Much of the time data in HBase (not just values, but often rowkeys and even the column qualifier) is binary, with the encoding determined by an application.

Currently Hue is incapable of displaying these in a usable manner. It tries to interpret the data as an (UTF-8 ?) string, and the ouput is full of placeholders for unprintable characters. (at least I was not able to change this from the UI)

Due to the lack of standard encodings, and metadata on the encoding, it is not possible to display the decoded contents in a reliable manner.

This same problem is solved in HBase shell by escaping binary data. Bytes that are printable ASCII characters are displayed as their ASCII character value, while bytes outside this range are displayes as escaped hex codes.

While this is still not a super-user friendly format, most HBase users are familiar with it, and have workflows to handle it.

I propose doing the same in Hue, using the same encoding to display all data (that is not otherwise identfied and handled).

Additionally, this encoding could also be supported in the editor, by accepting an escaped string and converting to its binary representation.

The escaping code is very simple, these are the java methods for escaping / unescaping:

https://github.com/apache/hbase/blob/156e430dc56211c0aea15d792e8733b1b0e3de5c/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java#L574 https://github.com/apache/hbase/blob/156e430dc56211c0aea15d792e8733b1b0e3de5c/hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java#L607

There are further possible enhancements like being able to interpret the data as hex strings, or being able to switch the data encoding for cells/rows etc dynamicall to one of the standard encodings in org.apache.hadoop.hbase.util.Bytes, but those are less critical, and can be handled separately.

stoty avatar May 07 '24 07:05 stoty

Hi @stoty and thanks for reaching out. Can you provide an screenshot example of how this looks in HBase for a clearer picture?

bjornalm avatar May 08 '24 09:05 bjornalm

Creating and displaying binary data in hbase shell:

hbase:006:0>create 'demo', 'cf1'; Created table demo Took 0.8557 seconds
=> Hbase::Table - demo hbase:007:0>put 'demo', 'ascii_key', 'cf1:ascii_qialifier', 'ascii_value'; Took 1.2427 seconds hbase:010:0> put 'demo', "binary_key\x00\x01\xff", "cf1:binary_qualifier\x00\x01\xff", "binary_value\x00\x01\xff"; Took 0.0089 seconds hbase:019:0> scan 'demo' ROW COLUMN+CELL
ascii_key column=cf1:ascii_qialifier, timestamp=2024-05-08T11:40:29.760, value=ascii_value
binary_key\x00\x01\xFF column=cf1:binary_qualifier\x00\x01\xFF, timestamp=2024-05-08T11:42:55.064, value=binary_value\x00\x01\xFF
2 row(s) Took 0.0112 seconds

As you can see, the binary values are entered as escaped hex characters, and the results are displayed the same way.

The same data in Hue looks like this:

Screenshot from 2024-05-08 13-52-20

The easiest and most HBase-like solution would be using the same hex escaped format in Hue.

This could be a toggle in the toolbar for backward compatibility.

stoty avatar May 08 '24 12:05 stoty

For ease of identification, I have used an ascii prefix, but the data is often pure binary, like a long or an integer.

stoty avatar May 08 '24 12:05 stoty

Thanks, let's leave this issue open to see if any one in the community can create a PR for it.

bjornalm avatar May 08 '24 13:05 bjornalm

#77 is a similar issue.

stoty avatar May 09 '24 09:05 stoty

This issue is stale because it has been open 30 days with no activity and is not labeled "Prevent stale". Remove "stale" label or comment or this will be closed in 10 days.

github-actions[bot] avatar Jun 09 '24 01:06 github-actions[bot]

Fot the record this is NOT completed.

stoty avatar Jun 20 '24 04:06 stoty