assemblyscript-regex icon indicating copy to clipboard operation
assemblyscript-regex copied to clipboard

Strategy for ensuring functional correctness?

Open ColinEberhardt opened this issue 3 years ago • 4 comments

I'd consider this library to be a useable POC (proof of concept) in its current state. The implementation is based around the description of regex as provided by the MDN regex cheat sheet, however, this is not a detailed description of the regex language and in some places deviates from others see - #9

At this point it probably makes sense to consider a few points:

  1. What regex specification should this library use for the purposes of implementation?
  2. How should this library be tested in order to meet this specification?

ColinEberhardt avatar Jan 25 '21 10:01 ColinEberhardt

My initial thoughts

  1. The regular-expressions.info website has very detailed (and easy to understand) descriptions of the syntax and functionality. It also describes languages-specific variations, including JavaScript. See for example the page on char classes
  2. TC39 / Test262 is probably a good test suite - although I'm not sure whether the focus is on the correctness of the RegExp API, or the regular expression language itself.

ColinEberhardt avatar Jan 25 '21 10:01 ColinEberhardt

As far as I know, and I know little, irregex, an implementation in Scheme, became the benchmark for browser RegExp with both Chrome and Firefox using it nowadays. Some random comments around the web even indicated that it's more standard in practice now than the specification itself, but not sure if that's actually the case. Perhaps a strategy could be to aim at compatibility with what the ES specification defines, and in case of doubt look at what irregex does? Hmm. Regarding testing, this makes me wonder if there may be tests somewhere in Chrome/FF one could run in addition.

dcodeIO avatar Jan 25 '21 10:01 dcodeIO

Thanks @dcodeIO - I was not aware of iiregex.

ColinEberhardt avatar Jan 26 '21 14:01 ColinEberhardt

Another possible solution could be fuzz testing which generate some valid regexps with different inputs and flags and compare result with the same parameters from JavaScript's RegExp.

Probably this library great starting point: https://github.com/codenote/regex-fuzz/blob/master/main.py

MaxGraey avatar Jan 27 '21 00:01 MaxGraey