effekt icon indicating copy to clipboard operation
effekt copied to clipboard

Enhance Regex Support in Standard Library for JS Backend

Open leonfuss opened this issue 1 year ago • 21 comments

This pull request significantly improves regex support in Effekt's standard library for the JavaScript backend. It introduces capture groups, group indices, and regex flags, addressing current limitations and providing a more powerful regex interface.

Key Enhancements:

  1. Support for capture groups and their indices
  2. Implementation of regex flags
  3. Stateful regex object for sequential matching

Current Limitations: Effekt's regex support is currently limited, lacking support for capture groups and advanced regex features. This implementation bridges this gap for JavaScript environments.

New Features:

  • Capture Groups: Ability to use and access capture groups in regex patterns
  • Group Indices: Retrieve the starting index of each capture group
  • Regex Flags: Support for standard regex flags (e.g., 'i' for case-insensitive matching)
  • Stateful Matching: Global flag ('g') set by default for sequential matching

Backwards compatibility This change is not backwards compatible.

Feedback on the API design and usability from experienced Effekt developers is highly appreciated.

leonfuss avatar Sep 16 '24 14:09 leonfuss

The tests are failing since examples/casestudies/lexer.effekt.md uses text/regex. Here are some more modules importing text/regex: https://github.com/search?q=repo%3Aeffekt-lang%2Feffekt+%22text%2Fregex%22+path%3Aexamples%2F**%2F*.effekt&type=code

jiribenes avatar Sep 16 '24 15:09 jiribenes

The tests are failing since examples/casestudies/lexer.effekt.md uses text/regex. Here are some more modules importing text/regex: https://github.com/search?q=repo%3Aeffekt-lang%2Feffekt+%22text%2Fregex%22+path%3Aexamples%2F**%2F*.effekt&type=code

I refactored the lexer example to account for the changes. The other occurrences shouldn't be affected since I'm only changing the JS backend? Please correct me if I oversaw anything.

leonfuss avatar Sep 17 '24 12:09 leonfuss

Thank you for your thorough review. I've implemented your suggestions, making the code more idiomatic. The addition of typed flags should notably enhance usability. I plan to add the test cases tomorrow.

leonfuss avatar Sep 17 '24 12:09 leonfuss

I ran into some problems designing the test. While the LSP server and I cannot find any error in the code, the compiler fails with the error [error] Cannot typecheck call. Unfortunately it doesn't specify the failed type check. I experimented now for one hour and could not get it working.

@jiribenes If you see any obvious problems why it doesn't type check I really would appreciate your help :) Otherwise I will just experiment around until I get a more expressive error :/

leonfuss avatar Sep 18 '24 08:09 leonfuss

I'll take a look. This error usually means that UFCS (foo.bar(baz) standing for bar(foo, baz)) failed.

jiribenes avatar Sep 18 '24 08:09 jiribenes

Is there a way to run certain tests only for the JS-Backend? I would like to add test for the new features, but it seems that the chez-backend is not happy with that. At least I have no trouble running the test succesful on my machine using only the JS-Backend.

leonfuss avatar Sep 18 '24 09:09 leonfuss

It looks like this test even fails on JS:

https://github.com/effekt-lang/effekt/actions/runs/10919082132/job/30306075187?pr=592#step:11:402

b-studios avatar Sep 18 '24 09:09 b-studios

Is there a way to run certain tests only for the JS-Backend? I would like to add test for the new features, but it seems that the chez-backend is not happy with that. At least I have no trouble running the test succesful on my machine using only the JS-Backend.

You can add the file into ignored here for the backends it should not run on together with a comment why it's disabled: https://github.com/effekt-lang/effekt/blob/master/effekt/jvm/src/test/scala/effekt/StdlibTests.scala

So for example for Chez-$something backends:

abstract class StdlibChezTests extends StdlibTests {
  override def ignored: List[File] = List(
    // Not implemented yet
    examplesDir / "stdlib" / "bytes",
    examplesDir / "stdlib" / "io",

    // Not implemented: advanced regex features
    examplesDir / "stdlib" / "string" / "regex.effekt",
  )
}

jiribenes avatar Sep 18 '24 09:09 jiribenes

It looks like this test even fails on JS:

https://github.com/effekt-lang/effekt/actions/runs/10919082132/job/30306075187?pr=592#step:11:402

It seems to work for me locally so it might be a NodeJS version issue? Did the API for regexes change there?

EDIT: it seems that the last part of the test labelled "capture groups" throws some error somewhere.

EDIT 2: I think that NodeJS 12.x does not support the d (GenerateIndices()) flag since it was only introduced in ES9 and implemented in NodeJS 16.x according to MDN.

jiribenes avatar Sep 18 '24 09:09 jiribenes

Don't know how to resolve this if we want to keep the functionality. We could update NodeJS: that might be a wise move anyway since even the extended support for NodeJS 12.x ended 2 years and 4 months ago (30 Apr 2022). Current oldest somewhat supported LTS version is NodeJS 18.x (where security support ends in 7 months). [I'd also be fine with updating to "just" NodeJS 16.x]

jiribenes avatar Sep 18 '24 09:09 jiribenes

EDIT 2: I think that NodeJS 12.x does not support the d (GenerateIndices()) flag since it was only introduced in ES9 and implemented in NodeJS 16.x according to MDN.

I just tested it locally and can confirm that. I will try to add a condition that the d flag will only works if the node version is >16.x and change the test to only run if node is >16.x

Do you see any problems with that?

leonfuss avatar Sep 18 '24 09:09 leonfuss

We can just require a newer node version, that shouldn't be a problem.

b-studios avatar Sep 23 '24 12:09 b-studios

Thanks for the PR! I think it is great to have improved regex support in the language.

I have one major concern, which is the compatibility with the LLVM backend. In general, I do not care so much anymore about the Chez and ML backends. However, we should aim for feature completeness of the LLVM backend.

Adding JS-only features now will make this more and more difficult in the future.

@phischu what is your opinion when viewing this from the LLVM perspective?

b-studios avatar Sep 23 '24 12:09 b-studios

Which of the extensions that are proposed in this PR are compatible with the regular expressions in POSIX?

(https://www.man7.org/linux/man-pages/man3/regex.3.html)

b-studios avatar Sep 23 '24 12:09 b-studios

master is updated to NodeJS 16.x via #599, you can just revert the last commit and rebase :)

jiribenes avatar Sep 23 '24 12:09 jiribenes

To clarify: I am not proposing that you should implement the LLVM part. I just want to know what the intersection between posix regex and your proposed features is.

b-studios avatar Sep 23 '24 13:09 b-studios

Which of the extensions that are proposed in this PR are compatible with the regular expressions in POSIX?

(https://www.man7.org/linux/man-pages/man3/regex.3.html)

The POSIX standard for Regex does only include basic support for capture groups. Only the extraction of the content is possible here, but generating indices for those is not. Stateful matching is definitely not known in the POSIX world.

Flags are a bit more difficult in compatibility. Usually flags are passed as additional command line arguments which makes it dependent on the used program if they are compatible or not.

Does that answer the question?

leonfuss avatar Oct 28 '24 14:10 leonfuss

u can just revert the last commit and rebase

Thank you :)

leonfuss avatar Oct 28 '24 14:10 leonfuss

I have implemented the following changes as suggested previously:

  1. The Global Flag is now handled directly. The exec function returns a Regex object that captures the string in the capture field. This allows subsequent calls on the returned object to iterate (lazily) through all matches. A shorthand for map on the Regex Object is provided, which maps directly onto the captured Match for an ergonomic API. This also eliminates the need for a global or io modifier to prevent inlining. Both attributes previously prevented the use of regex queries in areas where these effects are not allowed.
  2. The examples have been added to the tests and verified to work as expected.
  3. The GenerateIndices flag has been removed as an option and is now enabled by default.

Please feel free to share your thoughts on the current design.

leonfuss avatar Nov 06 '24 15:11 leonfuss

Currently, the tests fail for both LLVM and the Chez backend because they lack the flags interface. To proceed, I will disable the regex tests on those backends. As long as the regex engine of a backend supports flags, implementing those should be straightforward, as the only stateful Flag (Global) is now directly expressed in Effekt.

leonfuss avatar Nov 06 '24 15:11 leonfuss

I'll unassign myself, but let's keep this PR open for now.

b-studios avatar May 06 '25 12:05 b-studios