peg-markdown
peg-markdown copied to clipboard
Bold/Emphasis regression
I apologize if this is a repeat --- I thought I emailed you about this, but now cannot find that email. So perhaps I never sent it.
A while back you fixed a problem with the strong/emph parsing that could cause the program to either fail to parse or to take a long time to parse. That worked great, however it caused a regression where the following no longer parses correctly (example sent by one of my users):
The ***quick*** brown ***fox*** jumped
The problem appears to be introduced in the 2a19a38d075356650c59206069fa925028182522 commit, and remains in the latest commit of peg-markdown.
Thanks!
Fletcher
It's very unlikely that the problem is from 2a19a38, which doesn't have anything to do with emphasis and simply removes a clause that should be redundant.
Have you confirmed that the version immediately prior to that commit lacks the problem, and the version with that commit has it?
John
+++ Fletcher T. Penney [Jun 12 13 05:01 ]:
I apologize if this is a repeat --- I thought I emailed you about this, but now cannot find that email. So perhaps I never sent it.
A while back you fixed a problem with the strong/emph parsing that could cause the program to either fail to parse or to take a long time to parse. That worked great, however it caused a regression where the following no longer parses correctly (example sent by one of my users): The _quick_ brown _fox_ jumped
The problem appears to be introduced in the [1]2a19a38 commit, and remains in the latest commit of peg-markdown.
Thanks!
Fletcher
— Reply to this email directly or [2]view it on GitHub. [xJAuenYDiIoVt3LF3y68493iILMdls1w80D0BaSbtW8iiW09MXg2qoOe6rS_8Q_z.gif]
References
- https://github.com/jgm/peg-markdown/commit/2a19a38d075356650c59206069fa925028182522
- https://github.com/jgm/peg-markdown/issues/25
Yes. That's why I wrote in and included the builds. :)
The change you made to strong/emphasis fixed the problem you wanted to fix, but broke this.
F
Sent from my iPhone
On Jun 12, 2013, at 12:57 PM, John MacFarlane [email protected] wrote:
It's very unlikely that the problem is from 2a19a38, which doesn't have anything to do with emphasis and simply removes a clause that should be redundant.
Have you confirmed that the version immediately prior to that commit lacks the problem, and the version with that commit has it?
John
+++ Fletcher T. Penney [Jun 12 13 05:01 ]:
I apologize if this is a repeat --- I thought I emailed you about this, but now cannot find that email. So perhaps I never sent it.
A while back you fixed a problem with the strong/emph parsing that could cause the program to either fail to parse or to take a long time to parse. That worked great, however it caused a regression where the following no longer parses correctly (example sent by one of my users): The _quick_ brown _fox_ jumped
The problem appears to be introduced in the [1]2a19a38 commit, and remains in the latest commit of peg-markdown.
Thanks!
Fletcher
— Reply to this email directly or [2]view it on GitHub. [xJAuenYDiIoVt3LF3y68493iILMdls1w80D0BaSbtW8iiW09MXg2qoOe6rS_8Q_z.gif]
References
https://github.com/jgm/peg-markdown/commit/2a19a38d075356650c59206069fa925028182522 2. https://github.com/jgm/peg-markdown/issues/25
— Reply to this email directly or view it on GitHubhttps://github.com/jgm/peg-markdown/issues/25#issuecomment-19339446 .
Did you look at the commit you referenced? It has to do with links, not strong/emph.
+++ Fletcher T. Penney [Jun 12 13 10:00 ]:
Yes. That's why I wrote in and included the builds. :) The change you made to strong/emphasis fixed the problem you wanted to fix, but broke this. F Sent from my iPhone On Jun 12, 2013, at 12:57 PM, John MacFarlane [email protected] wrote: It's very unlikely that the problem is from 2a19a38, which doesn't have anything to do with emphasis and simply removes a clause that should be redundant. Have you confirmed that the version immediately prior to that commit lacks the problem, and the version with that commit has it? John +++ Fletcher T. Penney [Jun 12 13 05:01 ]:
I apologize if this is a repeat --- I thought I emailed you about this, but now cannot find that email. So perhaps I never sent it.
A while back you fixed a problem with the strong/emph parsing that could cause the program to either fail to parse or to take a long time to parse. That worked great, however it caused a regression where the following no longer parses correctly (example sent by one of my users): The _quick_ brown _fox_ jumped
The problem appears to be introduced in the [1]2a19a38 commit, and remains in the latest commit of peg-markdown.
Thanks!
Fletcher
— Reply to this email directly or [2]view it on GitHub.
[xJAuenYDiIoVt3LF3y68493iILMdls1w80D0BaSbtW8iiW09MXg2qoOe6rS_8Q_z.gif]
References
- https://github.com/jgm/peg-markdown/commit/2a19a38d075356650c59206069fa 925028182522
- https://github.com/jgm/peg-markdown/issues/25 — Reply to this email directly or view it on GitHub<https://github.com/jgm/peg-markdown/issues/25#issuecomment-19339 446> .
— Reply to this email directly or [1]view it on GitHub. [xJAuenYDiIoVt3LF3y68493iILMdls1w80D0BaSbtW8iiW09MXg2qoOe6rS_8Q_z.gif]
References
- https://github.com/jgm/peg-markdown/issues/25#issuecomment-19339673
I'll rephrase.
There is a bug in peg-markdown with the following being parsed incorrectly:
The ***quick*** brown ***fox*** jumped
In trying to be helpful and save you some trouble, I wanted to point out that this bug is actually a regression. Older versions handled it properly. Newer ones don't.
In trying to be more helpful, I wanted to point out that this regression appears to have occurred when you last "fixed" strong/emph parsing. Your fix did patch the problem you wanted to solve (in the examples I had noticed anyways), but caused this regression.
In trying to be even more helpful, I pasted the commit number related to your previous fix (though you certainly are familiar with your own code and probably would have found it on your own when you looked). I inadvertently selected one row higher than intended when copy/pasting. So you had to look one commit earlier. I apologize for being off by one row.
So that you don't have any further difficulty:
- The commit that introduced the regression: 39dfc1ce8e57cea66b13ff7b3db32ca7d17b0f59
- One commit earlier, where things still work: 482a0ba2ccb58d90953d9ef00a853a4ea24e05ef
- One commit later, where things are still broken: 2a19a38d075356650c59206069fa925028182522
Hopefully that is more clear.
Thanks!
Fletcher
I don't have much time to work on peg-markdown now, and I don't use it for anything myself. So you have to understand that I have very little motivation to look into this. I'm glad you're being as helpful as possible by identifying the exact commit to look at. There's no need to apologize, I was just pointing out that you had the wrong commit.
I've had a look. I see why it's happening, but I don't see an easy fix, other than reverting the change, which would reintroduce #19 (this would be very bad for people using peg-markdown in web apps).
I could fix this if I had more time, but I don't right now.
Perhaps for now, you should create a local fork and revert this change in your version; #19 shouldn't be a big issue for your users.
So does that mean peg-markdown is heading into "end-of-life"?
I'll throw my opinion out for whatever it's worth --- I think the PEG parser you've created is fantastic. The performance is great, and it allows for a lot of flexibility. Obviously, there are some challenges when trying to match a grammar like Markdown that is slightly more "intuitive", "ambiguous", or choose your adjective here. But I'm very impressed with it. (Which should be obvious since the last two versions of MultiMarkdown have been built around this.)
It's sad to hear that it might not be maintained moving forward, as you've been more successful at optimizing performance in some of these parts of the parser than I might have been (e.g. strong/emph, quotes, etc.). But I also understand that if you don't use it any more, you don't use it any more.
In any event, I do thank you for all the effort you've put into peg-markdown!
F-
Actually, the problem in #19 was a big problem for a few of my users. (It never ceases to amaze me that people are surprised when programs don't work well on strange/invalid input; that said, I do agree that my app shouldn't break because someone types the wrong thing into it... ;)
I saw your comments in the thread about #19. I'll try to dig in and see if I can understand your thought process at the time and see if I can come up with anything.
What is needed is a way to avoid parsing Emph when you're inside an Emph environment (and similarly for Strong). But (unless I'm mistaken) the code parts of the PEG are not executed unless the parser matches. That means that we can't set a variable that says "I'm inside an Emph inline" that affects the Inline parsers inside the Emph.
At first glance, I thought this seemed trivial, but now I think I understand the difficulty you mentioned. I'll have to experiment, but I think the "recipe" matches "left to right", but the included c code then effectively runs "right to left" or "inside to outside" depending on your perspective. I'm not doing a good job describing this, but perhaps it makes sense to you since you lived in this PEG for so long. So we would be setting the flag indicating that we're inside an emph after the parser processes the inside of the emph and looks for the flag. Is that roughly correct?
I think the best approach is to do something like what I do here: https://github.com/jgm/Markdown/blob/master/Markdown.hs#L1168
I think this could be done in PEG. It parses nested emph / strong with no backtracking at all, and thereby avoids exponential blowup. This has been pretty well tested on edge cases.
I'll take a look. Thanks.
F-
On Jun 12, 2013, at 4:10 PM, John MacFarlane [email protected] wrote:
I think the best approach is to do something like what I do here: https://github.com/jgm/Markdown/blob/master/Markdown.hs#L1168
I think this could be done in PEG. It parses nested emph / strong with no backtracking at all, and thereby avoids exponential blowup. This has been pretty well tested on edge cases.
— Reply to this email directly or view it on GitHub.
Fletcher T. Penney [email protected]
I tried digging back into this today. Haskell is fairly inscrutable if you don't speak it. So I'm still struggling with understanding what you did there, so I just went back to the original code and the issue that was discovered with this:
The ***quick*** brown ***fox*** jumped
A few basic tests suggests that the current peg-markdown approach does well, except when strong and emph are paired like the example. Could this be solved by simply treating "***" and "___" as a single unit when they occur together and test for this before Strong or Emph? Am I missing something obvious?
I added this, and it seems to do fine in my tests. But I'm probably missing something...
F-
PS> Hope all is well!
On Jun 12, 2013, at 4:10 PM, John MacFarlane [email protected] wrote:
I think the best approach is to do something like what I do here: https://github.com/jgm/Markdown/blob/master/Markdown.hs#L1168
I think this could be done in PEG. It parses nested emph / strong with no backtracking at all, and thereby avoids exponential blowup. This has been pretty well tested on edge cases.
— Reply to this email directly or view it on GitHub.
Fletcher T. Penney [email protected]
+++ Fletcher T. Penney [Jul 10 13 14:46 ]:
I tried digging back into this today. Haskell is fairly inscrutable if you don't speak it. So I'm still struggling with understanding what you did there, so I just went back to the original code and the issue that was discovered with this: The _quick_ brown _fox_ jumped A few basic tests suggests that the current peg-markdown approach does well, except when strong and emph are paired like the example. Could this be solved by simply treating "***" and "___" as a single unit when they occur together and test for this before Strong or Emph?
Well, yes, this is what I do in the Haskell code. But it's complex.
After "_" you could have a number of different things, and which
you have changes the meaning of "". For example, if it continues "hello**"
or "hello_ there**", then "**" means "
John
The simple approach seems to fix the original problem, but in adding some more edge cases I found another one that fails:
The _quick_ brown _fox_ jumped (this one fails in peg-markdown but not my test version)
The _quick_ brown fox jumped
The **quick* brown fox* jumped
The quick brown fox jumped
The *quick brown fox* jumped
The **quick* brown fox* jumped (this one fails in peg-markdown and my test version)
I haven't found a counter example that is worse with my approach, but I guess it would be best to go ahead and try to solve the next issue as well..... It's always something.... ;)
F-
On Jul 10, 2013, at 6:04 PM, John MacFarlane [email protected] wrote:
Well, yes, this is what I do in the Haskell code. But it's complex. After "_" you could have a number of different things, and which you have changes the meaning of "". For example, if it continues "hello**" or "hello_ there**", then "**" means "
". But if it continues "hello * there_", then "" means "". And if it continues "hello (end of paragraph)", then " " means "_**".John
Fletcher T. Penney [email protected]
John,
I had some time a week or so ago, and implemented a fix for this in MMD-4. ( ad69c0d1bc )
It's a separate project, so can't be pulled into peg-markdown, and I know you are no longer supporting that project. But should you decide you're interested, the MMD-4 strong/emph parser now seems to pass all the tests I have come up with so far.
Hope all is well!
F-
Fletcher T. Penney [email protected]
On Jul 10, 2013, at 6:31 PM, Fletcher Penney [email protected] wrote:
The simple approach seems to fix the original problem, but in adding some more edge cases I found another one that fails:
The _quick_ brown _fox_ jumped (this one fails in peg-markdown but not my test version)
The _quick_ brown fox jumped
The **quick* brown fox* jumped
The quick brown fox jumped
The *quick brown fox* jumped
The **quick* brown fox* jumped (this one fails in peg-markdown and my test version)
I haven't found a counter example that is worse with my approach, but I guess it would be best to go ahead and try to solve the next issue as well..... It's always something.... ;)
F-
On Jul 10, 2013, at 6:04 PM, John MacFarlane [email protected] wrote:
Well, yes, this is what I do in the Haskell code. But it's complex. After "_" you could have a number of different things, and which you have changes the meaning of "". For example, if it continues "hello**" or "hello_ there**", then "**" means "
". But if it continues "hello * there_", then "" means "". And if it continues "hello (end of paragraph)", then " " means "_**".John
Fletcher T. Penney [email protected]
Fletcher, good to hear. No time to look at this now...