pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

LaTeX reader: ignore \noindent and flush(left|right) environments.

Open bucklereed opened this issue 8 years ago • 10 comments

bucklereed avatar Oct 06 '17 22:10 bucklereed

I don't understand why we'd want to omit \noindent (even as a raw latex inline) even when raw_tex is enabled. I've certainly had occasions where I wanted to put a \noindent in there and pass it through...

jgm avatar Oct 07 '17 03:10 jgm

Okay, fair enough. I'm trying to cut down on the 'skipped content' warning noise, since they can indicate possibly-important things getting lost; I'd figured that \noindent was presentational and so droppable.

How about dropping \noindent if raw_tex is off, and preserving it as a raw inline if it's on? The only issue with that is that I will need to figure out how to test the absence of warnings.

bucklereed avatar Oct 07 '17 09:10 bucklereed

+++ bucklereed [Oct 07 17 02:19 ]:

How about dropping \noindent if raw_tex is off, and preserving it as a raw inline if it's on? The only issue with that is that I will need to figure out how to test the absence of warnings.

No, I think that when raw_tex is off it's particularly important to warn about skipped content, since in many cases people may be expecting something not to be skipped.

jgm avatar Oct 08 '17 01:10 jgm

Though, to be sure, we do ignore some things without warning, like \strut.

Another thing to consider is whether these warnings for skipped content should be INFO rather than WARNING level. That would reduce noise.

jgm avatar Oct 08 '17 01:10 jgm

No, I think that when raw_tex is off it's particularly important to warn about skipped content, since in many cases people may be expecting something not to be skipped.

I was thinking of having a whitelist of presentational-ish things that are OK to skip if raw_tex is disabled, and warning if it's not on the whitelist. \noindent would be there, plus probably some sundry other things that are already being ignored on an ad-hoc basis. It would have all the problems of whitelists, but I think that, after a few iterations, it'd be a good 90% solution.

I am not sure about knocking the warning level down. The warnings are good; there are just a lot of them, and I suspect that a lot of that lot will be presentational stuff that can be dropped if the aim isn't to round-trip back to the input format.

So, what I'd like is for pandoc to ignore stuff that it knows won't survive a trip through the AST without using raw blobs.

Here's another straw proposal: pandoc -f latex will warn on \noindent (and anything else on the whitelist, plus everything that it doesn't explicitly know about). pandoc -f latex-raw_tex squelches warnings for the whitelist but not for entirely unknown things. pandoc -f latex+raw_tex will keep everything. Or maybe a separate extension makes more sense at this point--something like keep_raw_presentational, on by default for the latex reader?

bucklereed avatar Oct 08 '17 13:10 bucklereed

I was thinking of having a whitelist of presentational-ish things that are OK to skip

I think that's a great idea. It's hard to debug conversion of a huge document if it's full of "skipped \noindent" warnings. Maybe we could just reduce the logger level for the things in the whitelist from WARNING to INFO as proposed by jgm? Regardless of the raw_tex switch as that would further complicate things (both for the user and for us).

mb21 avatar Oct 23 '17 08:10 mb21

+++ Mauro Bieg [Oct 23 17 08:10 ]:

I was thinking of having a whitelist of presentational-ish things
that are OK to skip

I think that's a great idea. It's hard to debug conversion of a huge document if it's full of "skipped \noindent" warnings. Maybe we could just reduce the logger level for the things in the whitelist from WARNING to INFO?

That's not currently possible, since this type of LogMessage is assigned WARNING log level. (We could introduce another message like InnocuousSkippedContent, but it seems a bit ugly.)

One idea would be to reduce the log level for all skipped content to INFO. Not sure about that, but it might make sense.

jgm avatar Oct 23 '17 16:10 jgm

One idea would be to reduce the log level for all skipped content to INFO.

Not sure about that... in the discussed use-case, if pandoc skips something it doesn't understand, I'd like to know about it (because e.g. in the case of \lettrine, I needed to either fix pandoc or the latex source to get the part of that text). On the other hand, if pandoc is just skipping known presentational command (like \noindent), then I would like that to be somehow separable from the former (e.g. the former WARNING, the latter INFO).

I haven't looked at Logging.hs yet, but in the Writer, wouldn't we just want to call something like SkippedContentLevel INFO raw pos with SkippedContentLevel :: Verbosity -> String -> SourcePos?

mb21 avatar Oct 23 '17 18:10 mb21

+++ Mauro Bieg [Oct 23 17 18:57 ]:

One idea would be to reduce the log level for all skipped
content to INFO.

Not sure about that... in the discussed use-case, if pandoc skips something it doesn't understand I'd like to know about it (because e.g. in the case of \lettrine, I needed to either fix pandoc or the latex source to get the part of that text). On the other hand, if pandoc is just skipping known presentational command (like \noindent), then I would like that to be somehow separable from the former (e.g. the former WARNING, the latter INFO).

Given that pandoc will typically skip quite a bit in any real-world HTML or LaTeX document, I'm now thinking that it makes more sense to make this an INFO level message. Warnings are things that almost certainly require a fix, while with the majority of these, you'll just want to ignore them.

Of course, sometimes they're important, but I'd rather let the author rather than pandoc figure out which are important and which aren't (using --verbose to get the output).

jgm avatar Oct 23 '17 21:10 jgm

I think this PR should be closed, and we should perhaps open an issue suggesting changing the log level of the "skipped content" messages to INFO. Though that is something that certainly needs more discussion.

jgm avatar Mar 18 '18 04:03 jgm