text icon indicating copy to clipboard operation
text copied to clipboard

Padded decimal and hexadecimal text builders

Open andrewthad opened this issue 7 years ago • 3 comments

In Data.Text.Lazy.Builder.Int, the functions decimal and hexadecimal are available. Sometimes, I need to zero pad my decimal and hexadecimal representations to a certain length. I propose adding the following two functions to this module:

paddedDecimal :: Integral a => Int -> a -> Builder
paddedHexadecimal :: Integral a => Int -> a -> Builder

The behaviors of these would be:

  • The padding occurs on the left.
  • The padding adds zeroes until that cause the builder to be of the desired length.
  • The padding happens after the minus sign in the event that the number is negative.

Examples of expected behavior:

> paddedDecimal 3 5
005
> paddedDecimal 6 (-103)
-000103
> paddedDecimal 2 1039
1039
> paddedDecimal (-5) 1
1

If this would be a welcome addition, I am already familiar with the integral text builder code, and I would be happy to add these functions. The only place where I am uncertain of how to proceed would be writing a performant specialization for Integer.

andrewthad avatar Aug 31 '16 14:08 andrewthad

It makes some sense, but I have questions :-)

  • For naming, similar functions refer to "justification".
  • From your examples, it looks like the sign character does not count towards the width.
  • What about left-justification with spaces? That's a much more common need, in my experience (and would mess with your sign placement here).
  • Is this likely to end up faster or simpler than writing a version that renders to a Text and uses some of the existing justification functions?

bos avatar Sep 06 '16 17:09 bos

Concerning naming, I'm fine with justify instead of pad, since justify is a more precise term.

The sign character is a bit of an annoying issue. I do not have much personal interest in the behavior for negative numbers. What I don't like about my original proposal is that inserting zeroes after the sign character means that paddedDecimal 6 does not reliably create Builders where the Text length is 6. It could be 6 or 7 (or higher if you provided a number that requires more the six digits to represent). For hexadecimal representations, we have convenient precedent that hexadecimal simply fails for negative inputs. This is becoming a ramble, so I'll turn to the next question.

I agree that left justification with spaces is a common need as well. Being able to pad with something other than zero would be nice.

It would be faster, but it would definitely be less simple. To use the existing justifyRight, you would have to first convert the decimal builder to strict text so that you could learn its length, and then justifyRight would create an additional copy of the Text which would get turned into a builder. I've not profiled this, so I do not know how significant this would end up being. It might end up being a negligible difference. On previous occasions, I've had poor performance when concatenating lots of small text builders, so I try when possible to work on MArrays.

Also, I probably should have included this to begin with, but my main inspirations for this were:

  • Hexadecimal in a MAC address requires leading zeroes. People usually write memory addresses in hex with leading zeroes.
  • People almost always write the minutes and seconds in a timestamp with leading zeroes.
  • W3C datetimes have an offset that looks like this: +04:00, +13:00, -07:00. (Although this doesn't really fit what I suggested because of the leading plus sign).

andrewthad avatar Sep 06 '16 20:09 andrewthad

I'm going to put this on hold for the moment. I've been playing with some ideas for a strict text builder which may subsume what I suggested and simplify the implementation. I might bring it up in another issue once I'm confident in the implementation.

andrewthad avatar Sep 07 '16 14:09 andrewthad