Added documentation for Java8 and Java9 unimplemented features.
This is basically some documentation (and a bit of code to obtain the lists) I wrote. There's no alteration to the other code, so I didn't touch the changelog.
Hey, thanks! This seems really useful, and I like the direction you are going with this, but we need to be more fine-grained here. If I am reading this right then you are starting from a Java regex, round-tripping it, then running it through the generators, and then checking the values against the regex. This could fail for a number of reasons, so it's too simplistic to conclude that something is "not implemented".
For example, we definitely do support vertical whitespace
(re/pattern :vertical-whitespace)
;;=> "\\v"
(re-parse/parse #"\v")
;;=> :vertical-whitespace
(re-gen/sample :vertical-whitespace 10)
;;=> ("\r" "" "
" "
" "
" "\r" "\n" "
" "\f" "
")
But maybe the generator isn't accurate, or something else is up. If so then that would be great to find out so we can improve it. The biggest challenge with this library is getting all the different implementations to line up, which is also why we are currently not adding any new regex features, because we are still in the stage of solidifying the features we have. So any information about (lack of) parity between generating regex, parsing regex, and generating generators would be really appreciated.
Did you find the test_cases.edn? Seems like you have a bunch more cases that could be added there, that would help a lot, since we run a bunch of different tests off of these to get fine-grained reporting on what's not working.
Some things are deliberate, we don't "support" POSIX character classes because they are really just shorthands or aliases, so you can get the same result by being explicit, and that way it's easier to get things to work cross-platform. Generally the starting point for us is always the Regal syntax. Any valid regal expression should work the same way across implementations and target platforms.
Cross-platform support is a really important goal for this library, so we don't care much about the specifics of Java's regex engine, we look at features that are common across regex implementations, and figure out how to support them on all the different targets. Supporting every feature of a specific platform is not a goal.
That all said I think this mostly can go in, it's useful information, but it would be great if you could go over the list and have a closer look at the ones that are nominally supported, like :vertical-whitespace, so we can find out what's up and maybe improve the library.
- Lookahead/lookbehind is supported, but not for generators
Should this be reported as a bug/issue, or is this not something we can realistically do in the generators?
Thanks!
Hi @cons-dev, there's not a rush on this, but do you think you'll be able to work on this again soon?