sile
sile copied to clipboard
Paragraph indentation is lost after commands
This can be seen on this document:
\begin[class=book,papersize=129mm x 198mm]{document}
\script[src=packages/rules]
I have seldom heard him mention her under any other name.
\hrule[width=100mm,height=0.25mm]
More text here
\end{document}
More text here
continues on the same line as the horizontal rule, but should start on a new line.
I've bisected it to this commit 269963ad, more specifically to lineEndLineStartSpace
in command =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0 * lineEndLineStartSpace)-P("\\end{")
.
Urgh. Working out which spaces should and shouldn't be ignored after commands seems like it is beyond me. :-(
A\relax␠B
should render as A B
. But
A\relax
B
should render as AB
, and (presumably) so should
A\relax
B
and possibly even
A\relax␠
(additional space after\relax
)B
But
A\relax
␠
B
should render as
A
B
(I think.) If this is correct, and someone could put the rule into words, we can write tests for it. At the moment I'm just too confused. :-(
I think the logic should be absolutely simple: \command\s*EOL
should be removed. This means that for the purpose of formatting, if the line ends with command and some optional spaces, the command is removed (along with the spaces and the end-of-line) as if it never existed in the text.
I think it should handle nicely all the test cases you have in your example, but I tried to change lineEndLineStartSpace
to (lpeg.S(" ")^0 * lpeg.S("\r\n"))^-1
, which expresses what I'm looking for, but it still doesn't work. Any idea why?
I think I got it; this is the diff against the current master:
diff --git a/core/inputs-texlike.lua b/core/inputs-texlike.lua
index 4805208..fe7eb11 100644
--- a/core/inputs-texlike.lua
+++ b/core/inputs-texlike.lua
@@ -14,7 +14,7 @@ SILE.inputs.TeXlike.parser = function (_ENV)
local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
local parameters = (P("[") * list * P("]")) ^-1 / function (a) return type(a)=="table" and a or {} end
local anything = C( (1-lpeg.S("\\{}%\r\n")) ^1)
- local lineEndLineStartSpace = (lpeg.S(" ")^0 * lpeg.S("\r\n")^1 * lpeg.S(" ")^0)^-1
+ local lineEndLineStartSpace = (lpeg.S(" ")^0 * lpeg.S("\r\n"))^-1
START "document";
document = V("stuff") * (-1 + E("Unexpected character at end of input"))
@@ -22,10 +22,10 @@ SILE.inputs.TeXlike.parser = function (_ENV)
stuff = Cg(V"environment" +
((P("%") * (1-lpeg.S("\r\n"))^0 * lpeg.S("\r\n")^-1) /function () return "" end) -- Don't bother telling me about comments
+ V("text") + V"bracketed_stuff" + V"command")^0
- bracketed_stuff = P"{" * V"stuff" * (P"}" + E("} expected"))
- command =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0 * lineEndLineStartSpace)-P("\\end{")
+ bracketed_stuff = P"{" * V"stuff" * (P"}" + E("} expected")) * lineEndLineStartSpace
+ command =((P("\\")-P("\\begin")) * Cg(myID, "tag") * Cg(parameters,"attr") * V"bracketed_stuff"^0)-P("\\end{")
environment =
- P("\\begin") * Cg(parameters, "attr") * P("{") * Cg(myID, "tag") * P("}") * lineEndLineStartSpace
+ P("\\begin") * Cg(parameters, "attr") * P("{") * Cg(myID, "tag") * P("}")
* V("stuff")
* (P("\\end{") * (
Cmt(myID * Cb("tag"), function(s,i,a,b) return a==b end) +
It isn't just paragraph breaks that are a problem here. There is a problem with trailing commands on all lines:
two
words
…typesets as "two words" whether or not you include a trailing space on the first line. On the other hand:
\em{two}
words
…typesets crammed together as "_two_words", again whether or not you manually include any spaces after the command.
See one of the Greek reader page samples from @jtauber for an example of this being a problem.
Good point. @simoncozens, is it possible to identify those commands that produce any output in the document? Maybe, as a post-processing step, drop empty lines that had commands on them?
The logic would have only 3 steps:
- Drop any trailing whitespaces after a command (but not the new line)
- Capture the fact that the line has a command
- Delete any empty line if it had a command
It can be simplified further by removing not just empty lines, but also whitespace only lines that include commands, but I'm not sure about all implications. One good aspect is that it would make _\command
, \command
, and \command_
work the same, but I'm not sure if there are any cases where the difference may be significant.
@alerque, I'm still thinking about this and it may be the case that \em{two}\nwords
behavior is fine. It's difficult for me to say, but the patch that I shown earlier handles the cases that Simon listed exactly as he wanted to.
There are several easy workarounds: (1) add a space before words
, (2) put those on one line, or (3) add a comment after \em
: \em{two} %
. Each of those should render two
and words
separately.
I'd prefer to handle this case "automatically", but it may not be possible without deeper analysis.
@simoncozens, thoughts?
Right now my thoughts are I would like to rip out the changes I made and return to the old, you-need-to-add-percent-signs-everywhere parser. It was less user-friendly but at least it was consistent and unsurprising...
:cry: Having more %
s floating around than even LaTeX needed was not a happy state of affairs.
It was less user-friendly but at least it was consistent and unsurprising...
I wouldn't call it unsurprising as I was quickly bitten by text\n\command\nmore text
not being in the same paragraph (and I'm quite familiar with LaTeX). I have a habit of putting commands (especially with a list of parameters on a separate line), which breaks paragraphs with the current logic.
Having more %s floating around than even LaTeX needed was not a happy state of affairs.
I'm with Caleb on this one. This is one of the things I'd prefer to see less in my text/code.
@simoncozens, I'm proposing a very simple rule that should (1) handle the examples you listed, (2) handle those examples that Caleb mentioned, and (3) be simple to implement:
When a line starts with a command without any leading spaces, treat the previous new line as a space unless it's an empty line (a line without non-whitespace characters). That's it.
This means that text \command
and text\n\command
will be rendered the same, which is what the user would want in most (all?) cases. The "unless it's an empty line" clause is needed to avoid merging the line into the first paragraph in the following:
Some text
\hrule[width=100mm,height=0.25mm]
More text
I've spent half a day today trying to figure out what exactly the problem is and why the current solution is not working.
After trying the various examples and fixes, I came full circle. I agree with @simoncozens that the earlier functionality needs to be restored (revert 269963ad and 069205f2).
If the functionality is restored, there are several "features" that (ideally) needs to be fixed:
- several commands before a paragraph change indentation of that paragraph. This is the issue described in #54.
- several commands with an empty line produce a standalone (empty) paragraph.
- several commands at the top of the document with an empty line produce a standalone (empty) paragraph (this and the previous item are the same issue, just caused by different scenarios).
Examples:
\begin[class=book,papersize=129mm x 198mm]{document}
\script[src=packages/color]
\script[src=packages/rules]%\hrule[width=10mm,height=0.25mm]
\script[src=packages/image]
First word, first paragraph.
\raggedright
\raggedright
Last paragraph
\end{document}
I've tried various fixes, but I don't think any of them are going to work simply because there are commands that need to ignore spaces and other commands that don't need to and the parser doesn't have that information. For example, in the example above \hrule
may need to be rendered and if there are two \hrule
commands, then the whitespaces between them become significant.
Even the rendering of that \hrule
is not without flaws as the previous command includes a whitespace, so the rule is rendered with a small indent. Adding %
to the second line (with \script[src=packages/color]
) fixes the issue.
If it's not possible to come up with a rule that is going to work automatically, I think we may need to introduce a new command that will mark a set of commands and "eat" all whitespaces. This is still better than %
as it's going to be applied to all inserted commands and everything else in between.
I'm proposing \cmd
and \cmd{}
. The former will ignore all whitespaces from this command to the end of the paragraph or the first non-empty text (whichever comes first). The latter will ignore whitespaces inside its body.
I've tried to implement this command based on font
command, but I couldn't figure out how to walk content
to remove/ignore whitespaces inside:
SILE.registerCommand("cmd", function(options, content)
if (type(content)=="function" or content[1]) then
SILE.settings.pushState()
SILE.process(content)
SILE.settings.popState()
end
end, "Processes commands while ignoring whitespaces")
Any help?
there are commands that need to ignore spaces and other commands that don't need to and the parser doesn't have that information
I've been playing with scenarios myself and arrived at much the same conclusion you did. The parser needs to be fed more information. It's a crazy hard problem, but however it is solved I want a syntax I can actually teach to non-programmers and have them understand what the deal is. Trailing comment operators to solve a white space parsing issue seems like it will be a never ending debacle to explain.
The \cmd{}
(or perhaps a more descriptive name like\silent{}
or \nooutput{}
) makes a bit of sense and I agree is an improvement over trailing %
. That would also allow for block syntax to be used, ala <head>
it HTML or everything before \begin{document}
in LaTeX without worrying about throwing off the document layout with some extra spaces before the content.
\begin{cmdonly}
\stuff
\here
\end{cmdonly}
(Still trying on command names for size there.)
Another possibility that occurred to be is simply an alternate syntax for all commands. If the idea is that the parser doesn't have enough information, some structural part of the syntax could act as a flag for whether or not any given command should or should not (and perhaps how) eat whitespace.
The \cmd{} (or perhaps a more descriptive name like\silent{} or \nooutput{}) makes a bit of sense and I agree is an improvement over trailing %. That would also allow for block syntax to be used, ala
it HTML or everything before \begin{document} in LaTeX without worrying about throwing off the document layout with some extra spaces before the content.
Right; I also played with \quiet
, \preamble
and few others; I prefer \cmd
because (1) it's similar to \script
and other SILE tags (it's a container for commands) and (2) it's short.
I didn't like \quiet
, \silent
, or \nooutput
, because this command doesn't suppress output from commands; it only removes whitespaces between commands.
\trim
is another option as it trims the result ("to make (something) neat by cutting it"), but \cmd
may be more similar to the philosophy behind SILE tags.
Another possibility that occurred to be is simply an alternate syntax for all commands. If the idea is that the parser doesn't have enough information, some structural part of the syntax could act as a flag for whether or not any given command should or should not (and perhaps how) eat whitespace.
I prefer a specific tag, simply because it limits the number of commands and because it eats whitespaces between commands, which would still be difficult to explain by having alternative syntax.
I got it working as a container \cmd{}
, but I also want to get it working as \cmd
(to the beginning of the text), similar to how \font
works and can't figure out how to do it properly yet.
@simoncozens, @alerque, I've implemented \cmd
command in both \cmd
and \cmd{}
variants; pushed to https://github.com/pkulchenko/sile/tree/command-friendly-whitespace. This branch also includes a revert of the two WIP commits in SILE.
Simon, I simply hacked massage_ast
as I couldn't make it work with command handlers for \cmd
(as it needs to be applied for subsequent elements that are not included in content
passed to the command handler). I'm sure you have some mechanism for traversing the tree, but I couldn't find it, so for ended up using the AST for testing purposes.
It achieves all the effects I wanted to; the following example works as I expect it:
\begin[class=book,papersize=129mm x 198mm]{document}\cmd
\script[src=packages/color]
\script[src=packages/rules]%\hrule[width=10mm,height=0.25mm]
\script[src=packages/image]
First word, first paragraph.
\cmd{
\raggedright
\raggedright}
Last paragraph
\end{document}
The first \cmd
(on the first line) eliminates incorrect vertical spacing and indentation for the first paragraph. The second \cmd{}
eliminates vertical spacing between paragraphs.
If you want to see the "original/current" behavior, simply comment out line 97 in inputs-texlike.lua
.
I played with making it enabled by default, but don't think it's a good idea.
Simon, what do you think about the \cmd
proposal?
@simoncozens, @alerque, any further thoughts on this? The current state is sort of the middle of not-quite-tex-like and not-quite-working-in-some-cases. The \cmd
proposal would revert this to the original behavior (which is expected result for those familiar with TeX), but provide a way to achieve proper formatting around commands without %
galore.
Hi Paul, sorry about the delay; I've been prioritizing a lot of the internationalisation work and stuff with discretionaries - now that's sorted and the tests pass again I'd like to come back to this. I think the \cmd
proposal is the way forward - revert the misguided changes to the parser and then introduce a new process
function which doesn't pass stuff to the typesetter at top level.
This sounds good. You can check my attempt to do this (https://github.com/pkulchenko/sile/tree/command-friendly-whitespace) as it includes the revert of the changes (one commit), plus the new command (another commit); I'm sure you can find a proper way to implement the needed logic ;). I've tried to make it work with both \cmd
and \cmd{}
and would like to have both variants available; the former is a shortcut that works for one paragraph (as far as I remember).
I've come up with another way to do this which I think works OK. Basically we revert the two erroneous changes and introduce a new rule: text which consists entirely of newlines is dropped. Please have a look at the latest commit, especially the new test tests/space-after-command.sil
. If this works the way everyone expects, I will fix up the regression test expectations.
Hmmmm, that's very interesting. So far all the random tests I through at it did the thing I expected.
Seeing as how you've jinxed me twice today I'm asking: @simoncozens are you working on the tests or would you like me to fix them up? I have a previous test related commit to send and can send it with or without updated regression tests for this issue.
I wasn't going to fix up the tests for this yet - was looking for a third opinion from @pkulchenko before declaring this done.
Alright well I have a fixed up set (all but one that I'm not sure about) that I'll post in a minute but you can wait to merge it until we decide on this. If it turn out we do something different I'll split out the non-whitespace related changes from the set.
One case where this handling is broken (relative to previous behavior) is in the \pagetemplate
command. Indenting such as:
\pagetemplate[…]{
\frame[…]
}
…leaves blank space at the top of a document where it didn't used to while
\pagetemplate[…]{
\frame[…]
}
…works as expected.
I toyed with the idea of never typesetting anything which is purely whitespace, but then you have the (very useful) \book:section:post
and similar macros which are defined as being a space. \book:sectioning\book:section:post{}Section name
should insert a space. So that wouldn't work.
I think the way around this is to give \pagetemplate
and other purely structural commands their own version of SILE.process
that never sends any text to the typesetter.
@simoncozens, this is an interesting change. I should be able to test it on my examples shortly, but one clarification first; you said "we revert the two erroneous changes", but I only see one change to remove * _
. Is it as intended or do you plan more changes?
See also 247ad19.
@pkulchenko I interpreted that as a reference to the fact that what got reverted was originally spread out over two commits.
Also if you have any example cases that are affected by this behavior I think we should flesh out the test to cover all the cases where this could make a difference to the output.
@simoncozens, right, missed the first commit.
@alerque, absolutely; I'll run on few examples I have and post them here if anything interesting.
@simoncozens, I tried to test, but I'm getting core/inputs-common.lua:13: attempt to index field 'masterFilename' (a nil value)
on the latest master. This doesn't seem to be related to the changes we've been discussing; I bisected it to this @alerque's commit: 1688beadec49db3f043b915ff62c. Is it just me or do you see the same error? I'm simply running lua sile examples\simple.sil
.
Did you try rerunning ./configure to get a new sile
executable? That bit me.