table-layout
table-layout copied to clipboard
Tighten trimming for ExpandUntil and ExpandBetween
Fixes #35
Before I dive into this, would you mind giving a simple example (with code) that shows the behavior before and after?
Behaviour before
Github messes up the spacing a little bit, but note that there are three spaces to the right of Bob and Sue, and two spaces to the right of the Sun Yat-Sen's names.
ghci> import Text.Layout.Table.Cell.WideString
ghci> putStrLn . tableString .
columnHeaderTableS [ column (expandUntil 5) left noAlign noCutMark ] unicodeRoundS (titlesH ["Text"]) $
[colsAllG top $ [map WideString ["Bob", "Sue", "孫德明", "孫帝象", "孫載之" ]]]
╭───────╮
│ Text │
╞═══════╡
│ Bob │
│ Sue │
│ 孫德 │
│ 孫帝 │
│ 孫載 │
╰───────╯
Behaviour after
Note that there are now only two spaces to the right of Bob and Sue, and one space to the right of Sun Yat-Sen.
ghci> import Text.Layout.Table.Cell.WideString
ghci> putStrLn . tableString .
columnHeaderTableS [ column (expandUntil 5) left noAlign noCutMark ] unicodeRoundS (titlesH ["Text"]) $
[colsAllG top $ [map WideString ["Bob", "Sue", "孫德明", "孫帝象", "孫載之" ]]]
╭──────╮
│ Text │
╞══════╡
│ Bob │
│ Sue │
│ 孫德 │
│ 孫帝 │
│ 孫載 │
╰──────╯
Explanation
Since Sun Yat-Sen's names each consist of three wide characters they are of width 6, which is larger than our requested maximum of 5. The therefore asks code would request that each string be trimmed down to width 5. However, none of Sun Yat-Sen's names can be trimmed to 5 characters: the closest you can get is 4.
Previously, it would trim them to 4 and then add 1 space of padding to make it up to 5 characters. But since none of the trimmed items are really of length 5, we can get better spacing by not adding the extra padding. This is what the code does now: it recognises that we really are only a column of width 4, and renders it that way.
Here's another example with cut marks
Before
ghci> import Text.Layout.Table.Cell.WideString
ghci> putStrLn . tableString .
columnHeaderTableS [ column (expandUntil 4) left noAlign ellipsisCutMark ] unicodeRoundS (titlesH ["Text"]) $
[colsAllG top $ [map WideString ["Bob", "Sue", "孫德明", "孫帝象", "孫載之" ]]]
╭──────╮
│ Text │
╞══════╡
│ Bob │
│ Sue │
│ 孫 … │
│ 孫 … │
│ 孫 … │
╰──────╯
After
ghci> import Text.Layout.Table.Cell.WideString
ghci> putStrLn . tableString .
columnHeaderTableS [ column (expandUntil 4) left noAlign ellipsisCutMark ] unicodeRoundS (titlesH ["Text"]) $
[colsAllG top $ [map WideString ["Bob", "Sue", "孫德明", "孫帝象", "孫載之" ]]]
╭─────╮
│ …xt │
╞═════╡
│ Bob │
│ Sue │
│ 孫… │
│ 孫… │
│ 孫… │
╰─────╯
In general, I am not opposed to these changes. The problem I am having is with the logic for measuring that is intertwined with the logic for building. And to me that indicates that the problem is not fully understood or not properly represented. If it cannot be represented that way, I would like to know the reasoning.
That's completely reasonable, and I'm not sure I've found the optimal approach. I'm interested in your feedback and thoughts.
The reason I've gone with this approach is that calculating the padding needed is almost all of the work of actually building. To figure out whether any padding is needed, you need to try attempt to trim the requested amount, and then check to see whether you had to trim more than requested. At this point, you build by adding the extra padding. So the keeping track of the built cell without padding, as well as how much padding is needed, makes either simple:
- To build with padding, just add the padding with
spacesB
(as in thecellViewToPadding
function) - To build without padding, discard the padding, or use
fmap
to operate on the underlying builder.
For all the examples we have so far, checking whether padding is needed involves dropping one character (or unit, for ElidableList
) at a time and seeing what width you've actually dropped. This is computationally intensive, and so I'd prefer not to duplicate if possible.
Perhaps a word on how I view this conceptually.
The Cell
typeclass is one that knows how to do 3 things:
- Measure its own length
- Measure the position of the first (or last) character matching a predicate
- Trim itself
I have explicitly left out padding, because padding is not something that Cell
knows how to do. The StringBuilder
class is where all the padding takes place. buildCell
is the bridge between the two that makes it look like Cell
can pad, but really it leaves that work to StringBuilder
.
For most instances of Cell
trimming and padding are naturally orthogonal, but WideString
and others break that paradigm. Since trimming decreases length monotonically, but is not surjective onto all numbers less than its length, some trimming requires padding to get the exact length requested. buildCellViewTight
restores the orthogonality of these two operations: it will only trim, and if any padding is required it records that in CellView
.
That said, I haven't been as principled about this philosophy as I could be: buildCellViewTight
will add padding if it is explicitly requested with a positive number in CellView
. However, if only trimming is requested it will not pad. Maybe it's worthwhile being more principled here.
The thing is, you do the scanning for multi-width characters twice anyway. I think we should treat efficiency as a different issue for now. If it is necessary later there is other stuff you can do (mostly caching in WideString
and WideText
, but I don't think it is that expensive that it will ever matter).
To me it seems the only thing that changed with the introduction of multi-width characters is that if we drop a character, several width units are potentially dropped at once, and not just one.
I have explicitly left out padding, because padding is not something that
Cell
knows how to do. TheStringBuilder
class is where all the padding takes place.buildCell
is the bridge between the two that makes it look likeCell
can pad, but really it leaves that work toStringBuilder
.
The way I see it, neither Cell
nor StringBuilder
do the padding. Cell
, with the new changes, just happens to implement applying the padding from CellView
. But it actually happens here:
- Figure out the width range of the column and create an abstract description named
ColModInfo
. - Potentially alter or use
ColModInfo
. - Interpret
ColModInfo
(trimming, padding, et cetera).
Now, I still do not see why we cannot make our existing code aware of the new dropping behavior. If it is not possible at all, the cause has to be identified and we need to figure out a new architecture.
The thing is, you do the scanning for multi-width characters twice anyway. I think we should treat efficiency as a different issue for now. If it is necessary later there is other stuff you can do (mostly caching in
WideString
andWideText
, but I don't think it is that expensive that it will ever matter).
Okay, that is a reasonable decision. I'lll go along with it.
To me it seems the only thing that changed with the introduction of multi-width characters is that if we drop a character, several width units are potentially dropped at once, and not just one.
Yes, that is my take on it too.
I have explicitly left out padding, because padding is not something that
Cell
knows how to do. TheStringBuilder
class is where all the padding takes place.buildCell
is the bridge between the two that makes it look likeCell
can pad, but really it leaves that work toStringBuilder
.The way I see it, neither
Cell
norStringBuilder
do the padding.Cell
, with the new changes, just happens to implement applying the padding fromCellView
. But it actually happens here:1. Figure out the width range of the column and create an abstract description named `ColModInfo`. 2. Potentially alter or use `ColModInfo`. 3. Interpret `ColModInfo` (trimming, padding, et cetera).
Now, I still do not see why we cannot make our existing code aware of the new dropping behavior. If it is not possible at all, the cause has to be identified and we need to figure out a new architecture.
By ‘our existing code’, you mean can we do this without introducing any new functionality into the typeclass? I'm sceptical that it's possible.
The existing code assumes that if you ask it to drop a certain width, it will be able to drop exactly that width. Namely, it satisfies the following invariant: visibleLength (CellView l r a) == visibleLength a + l + r
This is why it's able to do everything with a single call to visibleLength
at the beginning, and then determine the end length using arithmetic. Any trimming/building will result in a StringBuilder
, which does not have a way of measuring length. So once we've trimmed a Cell
, we need a way of knowing what length we really ended up with, and how much of it was just padding so that our invariant holds.
There are two ways I can see doing this: my buildCellViewTight
, which explicitly records how much space is needed, and your proposal for a visibleLengthAfterDrop
. They both record the same information in different ways.
(Note: I'm being a bit sloppy here with length, but I think my point is clear. Let me know if it isn't.)
It sounds like your call is that we go with some variation of visibleLengthAfterDrop
. I'm not a fan of the name, but here are some proposals:
visibleLengthWithoutPadding :: Cell a => Int -> Int -> a -> Int -- Questions: Does it take negative numbers for dropping (like CellView) or positive numbers? Does it ignore the other one?
visibleLengthWithoutPadding :: Cell a => CellView a -> Int -- Questions: Does it ignore requested padding?
My vote is for the second, and that it completely ignore requested padding.
My vote is for the second, and that it completely ignore requested padding.
I'm like to change my opinion to be that it should not ignore requested padding. This will enable the default implementation to be the straightforward visibleLengthWithoutPadding (CellView a l r) = visibleLength a + l + r
. The name should probably change though: perhapse visibleLengthTight
?
I do not care too much about the name. Small details are easily changed. I'm more interested in how the implementation will change and if there is something coming about that we did not plan.
I've attempted to redo this PR using the visibleLengthWithoutPadding
approach. It's feasible, but the last step of this PR becomes significantly more complicated.
The advantage of buildCellViewTight :: StringBuilder b => CellView a -> CellView b
is that it doesn't add the padding in, and just keeps it around. This is an advantage when we want to start adding cut marks. As it currently stands in master, cut marks added to wide strings are added after the padding, e.g. 孫 …
instead of 孫…
. Since adding cut marks happens at the StringBuilder level, we need a way to build the Cell without adding the extra padding… which is precisely what buildCellViewTight
does.
As I see it, we need to introduce functions to the Cell
typeclass with the following requirements:
- Can build to
StringBuilder
, but has to keep track of any padding necessary so it can be added after cut marks:buildCellViewTight :: StringBuilder b => CellView a -> CellView b
. - Can determine its minimal length after trimming (without padding), so columns can be kept tighter if necessary:
visibleLengthWithoutPadding :: CellView a -> Int
The second can be naturally implemented in terms of the first. The first can be implemented in terms of the second, but it's a bit clunky. I've done a lazy job here.
visibleLengthWithoutPadding a =
visibleLength a - totalAdjustment (buildCellViewTight a :: CellView String)
buildCellViewTight a@(CellView x l r) =
CellView (buildCellView $ CellView x (l - extraPadding) r) extraPadding 0
where
extraPadding = visibleLength a - visibleLengthWithoutPadding a
I think we should stick with the implementation with buildCellViewTight
. If you'd like, I'm happy to add a function (either a typeclass function or a standalone function) as defined above. Let me know your thoughts.
Let me try to summarise this second padding problem in a different way. Let me know if you agree.
In the current paradigm, there are three things that you can do with an instance of Cell
(among others).
- Measure the length with
visibleLength
. - Drop a specified width from the left/right with
dropLeft/Right/Both
. - Build to a
StringBuilder
withbuildCell
.
These are kept separate conceptually. Note that there is no concept of padding inherent in Cell
. Any padding is implemented using non-typeclass functions like pad
, trimOrPad
, or the column alignment functions.
The current architecture is able to maintain this separation because it assumes that you can drop any positive integral width from either side of the Cell
without issue. The length to drop is calculated by saying length to drop = visibleLength x - desired length
, and visibleLength (dropBoth l r x) = visibleLength x - l - r
.
Wide characters complicate this because there exist lengths you can't drop. For example, you cannot drop length 1 from a string consisting of a single width-2 character. In order to maintain the current architecture, we need to fake it by dropping more width than necessary, and then ‘topping up’ by adding extra padding to get to the desired width. By adding this extra padding, we can maintain the above invariants.
But this complicates the architecture, because it requires some sort of padding to be inherent in the definition of Cell
, something which was not present before. The solution in master requires this to be handled within the typeclass definitions by explicitly adding padding in buildCellView
.
However, this causes a problem. It means padding happens in a new place in the code, and this doesn't interface well with existing architecture. For example, when we call a function like trim
, it will trim to the desired length, build to a StringBuilder
, and add a cut mark. But since padding was added at the build stage, as necessitated by the current architecture, the cut mark gets added after padding, which is not the order we want.
There are a few ways I can see out of this problem:
- Don't add padding at the
buildCellView
stage, but instead keep track of how much padding would be needed to get the desired width. Existing functions can be converted to this approach fairly easily by using theMonad
instance ofCellView
. This is the approach taken bybuildCellViewTight
. - Make the
CellView
typeclass aware ofCutMark
s, and integrate higher-level functions liketrim
as typeclass methods. - For functions like
dropLeft
, instead of taking an argument specifying how much to drop, take an argument specifying the desired final length. In this case, dropping functions would need to be aware of length, so this would weaken the separation betweendropLeft
andvisibleLength
.
I continue to think that option 1 is the best approach.
Thank you for the summary. I feel that I do not fully understand everything and first need to dive back into the code. I have some vacation soon, so I hope to get this review done at around Christmas.
I found this issue to be complex so I am trying to structure my response into sections.
General
There are a few ways I can see out of this problem:
1. Don't add padding at the `buildCellView` stage, but instead keep track of how much padding would be needed to get the desired width. Existing functions can be converted to this approach fairly easily by using the `Monad` instance of `CellView`. This is the approach taken by `buildCellViewTight`. 2. Make the `CellView` typeclass aware of `CutMark`s, and integrate higher-level functions like `trim` as typeclass methods. 3. For functions like `dropLeft`, instead of taking an argument specifying how much to drop, take an argument specifying the desired final length. In this case, dropping functions would need to be aware of length, so this would weaken the separation between `dropLeft` and `visibleLength`.
I continue to think that option 1 is the best approach.
I think neither 2. or 3. can handle the actual complexity. In general, I agree with your assessment, that adding padding or trimming in buildCell
is a bad idea. We should integrate the minimum amount of logic necessary into the Cell
typeclass. It is just about cells themselves.
Issue
In my opinion, the crux of this issue lies in ColumnModifier
. If we can somehow integrate the column headers into the related functions, I agree in getting rid of it. The reason why I think this, is because we now need more than one iteration to find the actual width of a column. I.e. dropping more than one width unit may change the whole width to a smaller width (even below the minimum threshold, e.g., in ExpandBetween
or FixedUntil
).
Approach
I try to describe how I imagine it. It is very likely to overlap with what you already implemented or wrote. Thus, I don't want to claim this is something completely new or fully my idea.
- Measurement phase
- Measure initial width of columns (including headers).
- Use the column modification functions to compute the adjustment for each cell (may be represented as
CellView
). -
Optional: If, for an instance,
visibleLength c == length c
, we could even skip the remaining steps. - For each adjustment we can simply subtract the minimum of the column.
- Ensure that it fits the
LenSpec
.
- Construction phase: Apply the computed adjustments to the cells (i.e. padding, trimming, applying cut marks).
Details
- I would like to keep using
StringBuilder
out of the measurement phase. -
Cell
should receive new functionsdropLeftLengthUnits :: Cell c => c -> Int -> (Int, DropAction)
and and equivalentdropRightLengthUnits :: Cell c => c -> Int -> (Int, DropAction)
. The first result is the minimum amount of width units that are necessary to be dropped to achieve the given width units. The second result is an instance-specific type that describes the action necessary to drop from the cell. This enables us to avoid the visible length calculation a second time when we want to build the cell.- For
WideString
this could be simply an integer that specifies how many actual characters are dropped. - For
String
this is the same as the visible length. - Other types can use arbitrarily complex data types.
- For
- Functions that are used in interpreting column modifiers are now applied directly instead of interpreting
ColumnModifier
("column modification functions"). However, they will now compute the adjustment for a cell. - We need include cut marks in the computations.
- We need to replace
ensureWidthOfCMI
.
How to move forward
- Please let me know whether you agree with my assessment of the situation or see any issues.
- Move unrelated commits into a separate MR so that we can get them out of the way and into
master
:-
ffb3456f4dccfcf2627d257286530c749aba57f4 Simplify and generalise helpers for buildCell
-
70d15259cf51f1b0d377c2ed8d4c98239d84a7ca Clean up compiler warnings
-
- Check what we can reuse of the functionality of the other commits.
- I don't see any use case for
buildCellViewTight
. I.e. building is a step that should happen at the end. - Some of the work that applies to column modification functions may be reused in the construction phase.
- I don't see any use case for
- Check how that might interfere with
ElidableList
or other types in general.
If we agree on this proposal, then we can talk about who does the work or where we could collaborate. Technically, I wouldn't mind working on it and I still have a few days of vacation.
I will probably start with minor refactorings:
- Split column modification functions into measurement and building phase parts.
- Prepare for removing
ColumnModifier
.
This is a very detailed analysis and proposal. At first read-over it seems reasonable to me, but I'll need to think a bit more deeply (and probably try to implement bits of it) to know for certain. In the meantime, I'll factor out the unrelated commits and submit a separate PR.
How should we proceed? Are you intending to implement this, or should we divide up the work in a reasonable way?
How should we proceed? Are you intending to implement this, or should we divide up the work in a reasonable way?
If you are willing to help, what would probably make the most sense for you is adding the new Cell
type class functions dropLeftLengthUnits
and dropRightLengthUnits
, as described above, and implement all instances. Before you start please check if it makes sense to you and report back if you see any issues. If you feel more comfortable with something else, let me know.
What I did so far is introduce a type that tracks the length instead of building it instantly and adapt the code. This may or may not be the best representation but we can change that easily later on.
> trimOrPad left ellipsisCutMark 5 (WideString "孫德明")
CellMod {baseCellCM = WideString "\23403\24503\26126", leftAdjustmentCM = 0, rightAdjustmentCM = -2, leftCutMarkLenCM = 0, rightCutMarkLenCM = 1}
> trimOrPad left ellipsisCutMark 5 (WideString "Sue")
CellMod {baseCellCM = WideString "Sue", leftAdjustmentCM = 0, rightAdjustmentCM = 2, leftCutMarkLenCM = 0, rightCutMarkLenCM = 0}
The following parts are still missing.
- The logic to adjust the width (i.e. to eat up any superfluous space): Find unnecessary padding of all
CellMod
s in a column and then building. This may be done in one iteration with laziness. - The code it is embedded in needs to be changed a bit to get to the actual columns. I will have to look at it again.
- Adapt the code to use the new type class functions.
I'm looking into this now, but my brain is having a bit of trouble wrapping itself around this. Could you give a quick example of how you'd implement and use this?
Cell should receive new functions dropLeftLengthUnits :: Cell c => c -> Int -> (Int, DropAction) and and equivalent dropRightLengthUnits :: Cell c => c -> Int -> (Int, DropAction). The first result is the minimum amount of width units that are necessary to be dropped to achieve the given width units. The second result is an instance-specific type that describes the action necessary to drop from the cell. This enables us to avoid the visible length calculation a second time when we want to build the cell.
For WideString this could be simply an integer that specifies how many actual characters are dropped. For String this is the same as the visible length. Other types can use arbitrarily complex data types.
I'm looking into this now, but my brain is having a bit of trouble wrapping itself around this. Could you give a quick example of how you'd implement and use this?
Cell should receive new functions dropLeftLengthUnits :: Cell c => c -> Int -> (Int, DropAction) and and equivalent dropRightLengthUnits :: Cell c => c -> Int -> (Int, DropAction). The first result is the minimum amount of width units that are necessary to be dropped to achieve the given width units. The second result is an instance-specific type that describes the action necessary to drop from the cell. This enables us to avoid the visible length calculation a second time when we want to build the cell.
For WideString this could be simply an integer that specifies how many actual characters are dropped. For String this is the same as the visible length. Other types can use arbitrarily complex data types.
I hope the following makes sense.
instance (Cell String) where
type DropAction String = Int
dropLeftLengthUnits c n =
-- Actual characters are visible characters.
let n' = max 0 (visibleLength c - n) in (n', n')
visibleLength = length
instance Cell c => Cell WideString where
type DropAction WideString = Int
dropLeftLengthUnits c n =
-- Here you determine how many actual characters need to be dropped
-- to get to the desired visible length, and how many visible characters
-- can actually be dropped.
-- Later on, the DropAction may be used to do the dropping without measuring again.
instance Cell c => Cell (Formatted c) where
type DropAction (Formatted c) =
-- type that describes how to drop in a tree