Regex101 icon indicating copy to clipboard operation
Regex101 copied to clipboard

Question about Oniguruma implementation

Open Truncated opened this issue 1 year ago • 3 comments

Flavor Request

Apologies as I'm not honestly sure how to ask this question, but here's what I'm trying to figure out:

I'm trying to write a VSCode Textmate Grammar-based language for VSCode, and having a heck of a time figuring out the regex implementation differences.

I found that VSCode uses its own mod of textMate grammars and directly pulls Oniguruma via a dedicated binding here. This is updated, but not frequently. It's at 6.9.8 right now, which is really only .1 out of latest.

So Ruby and PHP both use Oniguruma, but they tend to be a lot more further back in versions. https://rubular.com/ uses Ruby 2.5.9., which I believe has version of Oniguruma v6.1.3.- PCRE is in the ballpark, but it's got enough differences to be problematic.

I'd like to see a raw version of Oni directly implemented, but selfishly I'd like to see the most compatible with the VSCode implementation. Is this even a flavor request or does it need to be some kind of implementation through an intermediary?

Truncated avatar Jun 19 '24 22:06 Truncated

Yes, this is a flavor request. Oniguruma is a powerful regex engine/flavor but has quite a few differences from PCRE, including lots of edge cases that work differently.

Note that, although you're right that TextMate grammars use Oniguruma, Ruby doesn't. Or rather, Ruby 1.9 did. Ruby 1.8 used its own/different flavor, and Ruby 2.0+ uses Onigmo by default. Onigmo is an Oniguruma fork that is very similar in its syntax and behavior, but has made enough changes/extensions (plus fallen behind compared to newer versions of Oniguruma) to consider it a different flavor.

slevithan avatar Sep 05 '24 18:09 slevithan

@firasdib regex101 could support Oniguruma using only JavaScript (without running a new server backend for it) by using either one of the following libraries:

  • oniguruma-to-es, an incredibly robust/accurate Oniguruma to native JavaScript RegExp transpiler that I maintain.
  • vscode-oniguruma, which gives access to the real Oniguruma C library compiled to WASM.
    • Although there are upsides to this, the downside is it requires downloading a 450+KB WASM file. Additionally, vscode-oniguruma (at least as of v2.0.1) doesn't offer access to named subpattern matches on match results (only subpattern start/end positions by subpattern index).

slevithan avatar Dec 29 '24 04:12 slevithan

yeah there's a couple different versions in play with multiple notable differences that cause issues often lookbehind assertion is not fixed length https://github.com/github-linguist/linguist/issues/3924 VSCode's oniguruma v6.9.8 TextMates Onigmo v5.13.5 (oniguruma v5.9.2) Githubs PCRE v8.36 (not PCRE2) https://github.com/RedCMD/TmLanguage-Syntax-Highlighter/blob/main/documentation/README.md#regex

@Truncated here's my VSCode extension to help with authoring TextMate grammars https://marketplace.visualstudio.com/items?itemName=RedCMD.tmlanguage-syntax-highlighter

RedCMD avatar Aug 15 '25 07:08 RedCMD