typescript-go icon indicating copy to clipboard operation
typescript-go copied to clipboard

Implement regexp literal syntax checking

Open jakebailey opened this issue 1 month ago • 6 comments

This PR implements regexp literal syntax checking. The main difference is that it is no longer a scanning step, as that would make it a parser error, which would mean we would have to go back to having separate ASTs depending on the language version being used (something we worked so hard to stop doing).

Instead, the regex checks are grammar checks that happen during checking. Additionally, the code doesn't live in the scanner package at all, as it is entirely self-contained (much to the original code's credit).

This is a really complicated piece of code largely due to the sheer scale of these checks, but also the annoying edge cases due to the old compiler's reliance of TS being in JS already and therefore getting identical string behavior when decoding. This required just a load of fiddling to get working.

The bulk of this is copilot ported, with a bunch of cleanups and fixes. The result is that every regexp related test matches, except for regularExpressionCharacterClassRangeOrder.errors.txt.diff and regularExpressionWithNonBMPFlags.errors.txt.diff, which I believe are improving because of better handling of positioning in weird characters, e.g.:

     // See https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
     const regexes: RegExp[] = [
     	/[𝘈-𝘡][𝘡-𝘈]/,
-    	   ~~~
+    	  ~~~
 !!! error TS1517: Range out of order in character class.
-    	          ~~~
+    	       ~~~
 !!! error TS1517: Range out of order in character class.
     	/[𝘈-𝘡][𝘡-𝘈]/u,
-    	         ~~~~~
+    	       ~~~
 !!! error TS1517: Range out of order in character class.
     	/[𝘈-𝘡][𝘡-𝘈]/v,
-    	         ~~~~~
+    	       ~~~
 !!! error TS1517: Range out of order in character class.
     
     	/[\u{1D608}-\u{1D621}][\u{1D621}-\u{1D608}]/,

But probably aren't really, it's just that the offsets are UTF-8 now and these are weird chars. tsserver in the editor shows similar ranges so this was probably just a bug in the old baseline squiggler.

jakebailey avatar Nov 05 '25 21:11 jakebailey

Based on the little differences I've noticed, the testing of this feature is not entirely complete...

jakebailey avatar Nov 05 '25 22:11 jakebailey

@graphemecluster I'm not sure if you are around and willing to look at this, but since you are looking at regex stuff again maybe you're willing to look at this again.

Not 100% confident in it, and I don't like the duplication due to the fact that we need some weird scanning stuff to deal with things the new scanner does not much care about...

jakebailey avatar Nov 06 '25 22:11 jakebailey

Let me see if I can get some time this weekend :D

graphemecluster avatar Nov 10 '25 09:11 graphemecluster

@graphemecluster Thanks for looking, and for the PR!

I have a bad feeling my approach is bad in general in that I have managed to duplicate functionality of the scanner itself, such that this code needs to move again...

I need some time to think harder about this (and your other PR on the main repo; microsoft/TypeScript#61042)...

jakebailey avatar Nov 19 '25 22:11 jakebailey

CharacterEscape and EscapeSequence are separate productions in the spec after all, it was just that the they are "similar" enough that I combined them together in my old implementation. Only HexEscapeSequence and LegacyOctalEscapeSequence are real overlaps within the two productions. So I am not sure if it is really worthwhile abstracting the method out. But well, I must not ignore the fact that they do account for a large portion of scanEscapeSequence

graphemecluster avatar Nov 20 '25 21:11 graphemecluster

Ah, you mean the identifier name part. Well obviously RegExpIdentifierName and IdentifierName` are distinct productions, and they do have quite some subtle differences. As for the utils dealing with surrogate pair, I would definitely suggest moving them to core or perhaps a standalone text manipulation folder.

graphemecluster avatar Nov 23 '25 01:11 graphemecluster