bech32 icon indicating copy to clipboard operation
bech32 copied to clipboard

Using ByteString internally?

Open k0001 opened this issue 5 years ago • 4 comments

Have you considered using ByteString internally, rather than Text? Considering bech32 values are likely to be stored/serialized as plain ByteStrings, seeing as they don't need any special encoding other than what plain old ASCII supports, having to convert back and forth between ByteString and Text for the purposes of parsing and rendering is wasteful.

k0001 avatar Jun 06 '20 10:06 k0001

Considering bech32 values are likely to be stored/serialized as plain ByteStrings

I disagree here. Storing data bech32-encoded as bytestring would be silly. In the same way that storing base16-encoded data as bytestring is silly. When encoding data in a human-readable format like these, the main purpose is for displaying into user interfaces (might it be a command-line in the console, a web interface or a desktop client ...)

Hence why Text is the chosen data-type from and to which data are decoded/encoded.

KtorZ avatar Jun 08 '20 12:06 KtorZ

Both arguments have merit. Haskell Text is internally UTF-16, so double the space usage. I have found that I needed to sprinkle Data.Text.Encoding.encodeUtf8 when using this library (but wait, it's ascii not utf-8). Other bytes-to-text encoding functions use ByteString too. On the other hand, Text clearly denotes that the bech32 value is not unreadable binary data. So on this basis I prefer Text.

rvl avatar Jun 09 '20 05:06 rvl

Perhaps I wasn't clear. I am not suggesting getting rid of the Text support in the API, I am suggesting using ByteString internally, and exposing and API for encoding and decoding ByteStrings directly, alongside the already existing Text one. The Text-based API would encode/decode the Text as ASCII/UTF-8 and defer its work to the ByteString-based implementation.

In my case, I'm dealing with many bech32 values which are stored as ASCII/UTF-8 bytes, as part of other data structures. Unfortunately, Text can't use these bytes as they are, so they must be converted to the Texts internal UTF-16 representation (via Data.Text.Encoding.decodeUtf8). This add significantly to the processing time, unnecessarily.

k0001 avatar Jun 09 '20 12:06 k0001

OK, I looked at how the aeson library does this, as an example.

We may wish to switch it around and have the "default" API be the Text-based wrapper.

rvl avatar Jun 10 '20 01:06 rvl