organice icon indicating copy to clipboard operation
organice copied to clipboard

Too eager parsing of italic markup

Open munen opened this issue 3 years ago • 6 comments

*** TODO organice parser error: Foo/Bar/Baz
Foo/Bar/Baz

Is shown as:

image

munen avatar May 08 '21 12:05 munen

I think that's a known bug and applies for all kind of markup. I think I remember that, once I worked on the regex parser, I decided that it's too hard to fix and postponed it to the future clojure parser :) But maybe a fresh look on the regex parser allows you to fix it.

schoettl avatar May 08 '21 17:05 schoettl

See also #94 and #198

schoettl avatar May 08 '21 17:05 schoettl

I don’t think it’s a duplicate. The other issues are about the parser choking on lines containing multiple inline markup statements.

This issue is much more simple: Foo/Bar/Baz is just text. It shouldn’t be italicized at all^^

munen avatar May 08 '21 19:05 munen

Yep, the issues linked above are only related. It's probably the same regex.

Anyway, the problem applies to all kind of markup, e.g. Foo+Bar+Baz

schoettl avatar May 08 '21 20:05 schoettl

You are correct, it does apply to all kind of inline markup. I've written a test and looked at the Regexp. But I don't understand why it doesn't work as written at the moment(;

munen avatar May 09 '21 06:05 munen

Worse example with only one slash:

“organice supports parsing and preserving the minimum/maximum range timestamps.” becomes:

C144C10B-A184-45F9-92DB-725D3927B7FE

munen avatar May 17 '21 10:05 munen

I did not look at the proper Org syntax, but what if we just required a whitespace before inline markup?

Thus, replace [*/~=_+] with \s[*/~=_+]?

lechten avatar Dec 03 '22 10:12 lechten

@lechten Your proposal fixes some terms like foo/bar and foo/bar/baz, but it breaks terms like *bold*. I tried adding to your proposal with [^\s][*/~=_+], but that's not working as expected. I must make a trivial mistake here^^

munen avatar Dec 03 '22 10:12 munen

Partial update: The ^ negates the \s, of course(; But [\s^] and [\^\s] also don't work.

munen avatar Dec 03 '22 11:12 munen

The ^ does not work inside square brackets (where it is just another character). We might add (^|\s) but this adds another group, which seems to ask for lots of subsequent changes...

lechten avatar Dec 03 '22 11:12 lechten

hihi, we're looking at the same options currently. I'm checking if there's a way for a match group to basically be ignored. But that's probably against the whole idea of match groups.

munen avatar Dec 03 '22 11:12 munen

There is a way! ((?:\s|^)?[*/~=_+]) looks promising.

munen avatar Dec 03 '22 11:12 munen

Looked promising and passes all tests, but it breaks foo/bar/baz, again-_-

munen avatar Dec 03 '22 11:12 munen

((?:^|\s+)[*/~=_+]) is most promising atm, but it fails *bold*. (There's blanks before the *bold* term, GH just opts to not render them).

munen avatar Dec 03 '22 11:12 munen

I think ((?:^|\s+)[*/~=_+]) would actually be a good solution, however the match group just direct before (([\s({'"]?)((?:^|\s+)[*/~=_+]) also has a \s which interferes...

Welcome to regexp based parsing hell(;

munen avatar Dec 03 '22 11:12 munen

I hope that PR #910 fixes this now (one more commit not mentioning this, sorry).

lechten avatar Dec 03 '22 15:12 lechten

Closed by the amazing work of @lechten in #910 :fireworks: :confetti_ball:

munen avatar Dec 03 '22 16:12 munen