Drasil LaTeX: Long expressions fall out of page boundaries

(Similar in purpose to #346)

Breqns was removed from the import list for the latex files (reasoning for which is discussed in #701).

Implementation for formatting of long equations is necessary to make the output look better.

Specific places where this is necessary:

SSP

Terminology and definitions

Data definitions

Instance models

Jun 27 '18 16:06 niazim3

From meeting: For the text, we might be able to use "parbox"s. For the equation, we will need to look into it a bit more to understand what is going on (it wraps around the "=", but not the "+"!).

Jul 29 '22 18:07 balacij

I don't remember what from the meeting today reminded me of this ticket, but something did. Another possible way to wrap the "block long expressions" (long expressions that aren't inlined) is to use a table to format them, splitting at each operator-precedence level and tabbing as needed. For example, if we had the equation $a = b \cdot{} c \cdot{} d + (e + f - g) / h + log(i) \cdot{} j - k + \dots{}$ where $a, b, c, \dots{}$ are long expressions as well, we might write it as:

$$ \begin{align} a = & \ b \cdot{} c \cdot{} d \ & + (e + f - g) / h \ & + log(i) \cdot{} j \ & - k + \dots{} \end{align} $$

(alignment is bad and poorly formatted because I couldn't use the tabular environment here on GitHub)

Then, if terms that are escaped to the next line are also too long, we can escape their terms to the next line as well, with the same pattern of escaping by the operators according to precedence. However, each one should be given an extra tab or something for readability. Since we already have precedence information with expressions, this should already be possible.

For inlined expressions (and text) on the other hand, the "parbox" solution should work fine.

Of course, with both of these ideas, we might still need to delve into gathering font and font metric information so that we can really ensure that overflows never happen.

Aug 24 '22 15:08 balacij

@balacij, my guess is that you were reminded of this issue because of the equation length problem @cd155 has had with the double pendulum example (see page 34, IM:calOfAngularAcceleration1).

I like the sound of your approach to long expressions @balacij. I think it would work in most cases. Interestingly, I'm not sure it would work in the double pendulum case because the long expression is a fraction. We can't really split a long numerator (or denominator) in a fraction at the operands. Looking at it again, I actually think we could split the double pendulum at the equal sign. I think the numerator is short enough that it could fit on one line, if it had the full line to itself. At some point, with a really long numerator (or denominator), we would have to add some local variables to temporarily rename subexpressions so that we could get the whole thing to fit. I don't know if we would automate that, or leave it to the user.

Aug 25 '22 16:08 smiths

Ah, I think that was it. Thank you!

The fraction is an issue for it. In some sort of an unconventional printing style, we could use a normal ÷ symbol (perhaps this should be a printing option anyways?). Splitting at the equal sign is also a good option. Once #1154 is also completed, we should be able to create variables to move long subexpressions away, as you mentioned. It would be really nice to have this fully automated, just so that we don't need burden scientists with formatting. However, I think that would require us to emulate more of how LaTeX compilers would process the tex source.

Aug 25 '22 22:08 balacij

The problem with a principled solution to line breaking is indeed the (non-) availability of paper width and font information. Deep inside TeX, this is available, but it's just too hard to work with 'externally'. And it's obviously not available at all in HTML since width is dynamic!

As far as somewhat satisfactory hacks go, what you propose for sums is good; for fractions, my temptation is to go with

let s = ... 
     d = ... in
     s / d

as the representation.

Aug 26 '22 12:08 JacquesCarette

I like let ... in ... syntax too. It's a nice way to name subexpressions, and we wouldn't need to continuously expand to the right as we add more definitions.

Regarding the automatic line breaking, that's exactly the issue. I think that some amount of information would be good to bring in, but I'm not sure how much. Should we list this under the possible projects on the wiki? Would it be worth the effort if someone were to build it out? It might be something that could also be applied to typesetting, in general, in Drasil. Some kind of a Doc overhaul!

Aug 26 '22 14:08 balacij

Unfortunately I think that line-breaking is way beyond Drasil. It requires information that is simply not available to Drasil.

Aug 26 '22 15:08 JacquesCarette

I just wonder does let ... in ... ok to handle the nested syntax? For nested syntax, I mean we can write let ... in ... inside of let ... in ....

Aug 26 '22 15:08 cd155

I imagine it would be fine, just being mindful of the variable names to avoid collisions. I've usually seen them merged together however, which I think would improve readability.

For example, instead of writing:

let
    a = ....
in 
    let
        b = ....
    in
        ....

We would probably prefer to write:

let a = ....
    b = ....
in
    ....

(or some similar form)

Aug 26 '22 16:08 balacij

@balacij, the idea of using the ÷ symbol is interesting, but probably not that aesthetically pleasing. Past high school you don't really see ÷ used all that often. It would likely look odd to many readers.

I agree with @JacquesCarette that full automation isn't feasible, since Drasil doesn't have access to the full information it would need, like page width, font size, etc. However, I think we could have a project for partial automation. I've created a project for partial automation on our potential projects wiki. By default, Drasil would format all equations the way it currently does. However, if the user didn't like the look of any equations, they could enable "long expression mode" to use the line breaking techniques mentioned above. The human user would detect that there is a problem, but they wouldn't have to figure out how to solve the problem; they could let the generator automatically split the equation. There is an analogy for how LaTeX handles floating figures. You can go with the default, but if you don't like it, you can tell LaTeX your preference by adding directives like !h. LaTeX still decides how to place the figure, but it has some preference information from the user.

I don't know if we would have one option for splitting an equation, or if we would have levels for how aggressively we want the equation to be split.

The techniques for splitting could be those that are discussed above.

Aug 26 '22 18:08 smiths

Drasil Drasil copied to clipboard

LaTeX: Long expressions fall out of page boundaries

SSP

Terminology and definitions

Data definitions

Instance models

Drasil
Drasil copied to clipboard