tm4e icon indicating copy to clipboard operation
tm4e copied to clipboard

joni onigurama library does not fully implement oniguruma syntax

Open sebthom opened this issue 1 year ago • 10 comments

TextMate uses the oniguruma syntax for regular expressions.

TM4E currently uses jruby/joni as onigurama implementation. Unfortunately joni does not implement the the full onigurama syntax, e.g. variable length look-behind. This means that not all TextMate language files can be parsed with TM4E and loading fails with invalid pattern in look-behind. Since the goal of the joni library is to emulate Ruby's behavior and Ruby does not support variable length look-behinds, joni won't support them anytime soon.

vscode uses the original oniguruma C library from https://github.com/kkos/oniguruma/

It might make sense to investigate if we can do the same, e.g. using kkos/oniguruma via https://github.com/jnr/jnr-ffi/

sebthom avatar Jan 29 '24 20:01 sebthom

I would strongly recommend going for https://openjdk.org/jeps/442 (Java 21 requirement) as jnr-ffi is not fun with Mac OS signing and the whole Orbit fuzz (https://github.com/eclipse-orbit/orbit-simrel/blob/6926eeed8ffaf2174b0fd42bc3c4e94208bda216/maven-sign/tp/MavenSign.target#L6)

akurtakov avatar Jan 29 '24 20:01 akurtakov

Well, I have no experience with JNI/JNA/JNR/FFM so I hope someone else picks this up :-)

sebthom avatar Jan 29 '24 20:01 sebthom

FWIW, moving away from Joni to embedding upstream Oniguruma could be a good improvement for the ecosystem.

mickaelistria avatar Jan 30 '24 09:01 mickaelistria

@mickaelistria any idea when we can expect eclipse to be bundled with justj 21 by default? I don't think moving forward with this via FFM is a good idea as long as Eclipse still uses justj 17

sebthom avatar Jan 30 '24 10:01 sebthom

@sebthom As per https://github.com/eclipse-simrel/.github/blob/main/wiki/SimRel/Simultaneous_Release_Requirements.md#execution-environment-partially-tested it should be fine for after 2024-03 release.

akurtakov avatar Jan 30 '24 10:01 akurtakov

It should be amazing to consume original oniguruma C but as so many plugins uses tm4e (I m so happy to have spent so many time to create this plugin and so happy that you continue to improve it dragsticly, thanks for that!) we need to be carefull.

IMHO I think tm4e should try to parse the textmate with the original oniguruma C and if it fails it should fallback to use joni like today.

angelozerr avatar Jan 30 '24 11:01 angelozerr

but as so many plugins uses tm4e

Those consumers will have to contribute back is they want to keep some Joni support. But at the moment, it seems like keeping Joni is not in the best interest of anyone. If APIs are broken, this would be worse moving to a new major release.

mickaelistria avatar Jan 30 '24 11:01 mickaelistria

Tm4e is working well with most textmates since several years.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

And users will havr bad feelings with tm4e

angelozerr avatar Jan 30 '24 11:01 angelozerr

Btw. IntelliJ has the same issue: https://youtrack.jetbrains.com/issue/IDEA-336274

The main issue atm are variable length look-behind. we could try to dynamically rewrite such unparseable regex to some alternative syntax as suggested here https://stackoverflow.com/a/24591663/5116073 - if that is possible.

Using the C lib directly could also improve overall parsing speed as discussed here https://github.com/jruby/joni/issues/43

sebthom avatar Jan 30 '24 12:01 sebthom

Tm4e is working well with most textmates since several years.

And it's likely to work even better if we can use upstream/more used/better maintained/... oniguruma instead of Joni.

User will not understand why their textmate will crash if oniguramm1 C using have a bug or if some OS doesnt support it correctly.

I don't think using upstream oniguruma is more error-prone than using Joni. Usually, the less indirections there are are, and the less room there is for bugs, the higher quality is.

And users will havr bad feelings with tm4e

Moving to upstream is more likely to increase the quality of the stack than to decrease it.

mickaelistria avatar Jan 30 '24 12:01 mickaelistria