commonmark-java icon indicating copy to clipboard operation
commonmark-java copied to clipboard

Implement methods to go backwards and look backwards inside Scanner

Open Emplexx opened this issue 6 months ago • 2 comments

Is your feature request related to a problem? Please describe. I'm trying to implement an InlineContentParser for images from a custom markdown spec from a certain website. Given I already parsed the opener, the spec requires me to look for the first whitespace character or end of text, then going backwards from there, look for the first ). Everything between the opener and ) is considered the image link.

This is a bit annoying to implement with the current limited set of methods Scanner has, and I also imagine the performance would be worse when instead of going backwards from the closer, you go forward from the opener via scanner.find(')') and perform an additional find to see if you just found the last ).

Describe the solution you'd like I would like for there to be similar methods to next and find that would instead look back from the current Scanner position, i.e prev, and findPrev. Then I could just do the following (sorry that the code is in Kotlin as I don't regularly write Java, I hope the intent is clear):

val start = scanner.position()  // position after the opener has been parsed
scanner.find { char -> char.isWhitespace() } // scanner gets moved to first whitespace or very end if there is none
scanner.findPrev(')') // scanner gets moved backwards until the position is at ')'. Ideally there would also be an overload that takes a position until which it should move, so i could pass the `start` to it and it would early exit if it doesn't find one by then
val end = scanner.position() 
val imageLink = scanner.getSource(start, end).getContent()

Describe alternatives you've considered As stated above I have considered iterating forward from the current position to find the next ) and check if there's another one to see if I'd found the last one.

Emplexx avatar Aug 18 '25 19:08 Emplexx

Can you provide some examples of the syntax you're trying to parse?

robinst avatar Sep 10 '25 12:09 robinst

The syntax I'm trying to parse and was using as the example is as follows: img(https://example.com/image.jpg) img200( https://example.com/image.jpg) IMG 50%(https://example.com/image.jpg) IMG 200 (https://example.com/image.jpg)

Broken syntax that should still be parsed: img(not-a-link-)more-text) more text, produces not-a-link-)more-text for the image link, looking for the last ) before the first whitespace/end of line. As you can see the first ) is included in the link.

I don't really have any issues with parsing the opening tokens, it's the closing token logic that i found to be hard with the methods that the scanner currently exposes.

Emplexx avatar Sep 10 '25 13:09 Emplexx