problem-solving
problem-solving copied to clipboard
Remove P5 regexes from Raku
Should we finally deprecated and eventually drop off Perl regexes support from Raku? I wasn't involved in early stages of language design, but can guess the idea of supporting them was part of the idea of smooth Perl5->Perl6 migration. But that matter doesn't seem to be on the table anymore.
This issue could be considered as continuation of rakudo/rakudo#2624
Quick greppabale over ecosystem brought up 5 modules for :P5 and one entry in documentation for :Perl5. Those should not be a big deal to be fixed with Raku regexes.
Why this matters? People asking questions like rakudo/rakudo#3316. I don't think we've enough resources to implement all Perl additions while our own regexes require a lot of improvements in the first place.
Also, removal would positively impact NQP/Rakudo compile times and reduce the overall compiler ocode size speeding up startup times.
Removing stuff that almost nobody is using? :) Yes!
For the record: related IRC discussion. Looks like a vote to me...
:P5 is Raku's more-or-less-PCRE library. Many languages have something like that, although they don't have to maintain it themselves (and it's PCRE not Perl regexes). And there are more resources about PCRE than Raku regexes. Removing the feature, even if it is incomplete, would remove the basic possibility for people to just copy a regex they find online and add an adverb.
From a pragmatic or (regex) newbie point of view, that would be a loss. A negligible one perhaps, as greppable / Blin would show, but IIRC those tools only go through the ecosystem. I'm one of the unnumbered people who use this adverb in non-public code (yes, a whole three times) or have used it in a blog post or anything not indexed by those tools.
I would appreciate it if someone who knows the internals could assess the feasibility of a slang replicating the exact current :P5 adverb, so that support can continue to live or rot in module space, keeping code working up to a hypothetical use Regex::P5.
Bonus question: Is it possible to delegate compilation and matching of Perl regexes to Inline::Perl5?
I would appreciate it if someone who knows the internals could assess the feasibility of a slang replicating the exact current
:P5adverb, so that support can continue to live or rot in module space, keeping code working up to a hypotheticaluse Regex::P5.
I was under the impression that the P5 regex support was not complete. While there is certainly a good usecase for PCRE and/or Perl regex support in core, I don't think it's the case if it doesn't work as advertised.
Bonus question: Is it possible to delegate compilation and matching of Perl regexes to
Inline::Perl5?
This may be a good idea if technical possible, but I'd rather see something like that as a module, outside of core. You don't want to depend on a the runtime of an other language for core functionality.
Such a functionality would depend on Inline::Perl5 being installed (not loaded), just like use Benchmark:from<Perl5>; works. Having that module installed, implies that the other language is available.
I'd rather see something like that as a module, outside of core. You don't want to depend on a the runtime of an other language for core functionality.
I agree it would certainly be nicer as a module, mandatory if Inline::Perl5 gets involved. The more modules can look like they do core the merrier, I think. But I wanted to see a bit more discussion about ways to retain backwards-compatibility in this ticket, so I asked.
I will look into lizmat's comment. I don't grok it yet but the "not" sounds tough...
@taboege a quick look into Raku Grammar tells me that :p5 is being handled by P5Regex slang. So, I actually see no problem in plugging in whatever is desirable as the handler of that slang.
Moreover, I think it's even possible to extend adverb handling by regex keywords so that things like rx:myRXengine// would be feasible. Like, rx:pcre//, rx:dosglob//, etc. :)
I feel that any kind of regex other than the official Raku-specific grammar do not belong in core and should definitely be the domain of optionally installable modules. Anything for Perl 5 syntax or PCRE or any other regex engine should not be in core and be like a third-party install. But we'd also want to make it easy to install those optional components.
When did we start removing (extensively) spec tested features that are provably used in the wild? What about our promises to stay backwards compatible? That has always been one of the core strengths of Perl and I thought this was a cultural thing that lives on in Raku.
It is onerous to stay 100% backwards compatible, and it is reasonable to break some things when that gives valuable gains in return. Besides-which, the main promise for backwards-compatibility concerns when code declares what version of Raku it requires. If we deprecate P5 in 6e and remove it in 6f, code written for 6d will still run on implementations supporting 6d. But another key thing is that Raku grammars can do everything the P5 ones can, and so there is a path developers can take, by using the Raku grammars instead of P5, that code will run unchanged on all Perl 6 or Raku implementations so far and for the foreseeable future, and thus the backwards-compatibility remains. Where you have a problem is when there is no reasonable code a developer can write that runs on both older and newer Raku implementations, but in this case they can, using Raku grammars.
Sorry, but in my eyes backwards compatibility means something completely different. It doesn't mean "the new version can do something to the same effect", instead it means "stuff keeps working as it is". So far we have gone to great lengths to keep that promise. I personally have invested great amounts of time and effort to keep that promise. We sometimes even keep implementation bugs, so existing code keeps working, like in https://github.com/rakudo/rakudo/commit/7d37f9aaf0 It certainly is onerous to stay backwards compatible, but our motto is and has always been "torture the implementor for the sake of the users".
I have yet to see any good arguments about the benefits this would bring besides some vague talk about compilation speed and startup times, none of which match my understanding of rakudo's performance characteristics.
I think the compilation and startup times effects would be negligible (but like to be proven wrong).
I think this is more about expectations: the promise that :p5 provides Perl compatible regexen to just work. Which expectation was proven wrong by https://github.com/rakudo/rakudo/issues/3316 . Rather than spending core developer time to adding these features to the core, I think there is a more fruitful route by putting it into a module, and spend time on making other regex modules pluggable.
FWIW, a similar issue exists with pack/unpack, which, BTW, does have a more complete module implementation already: https://modules.raku.org/dist/P5pack:cpan:ELIZABETH
@niner As it was said, we would keep support for older revisions. Newer ones can have it through a third-party module. This way back-compat would be kept up until we drop 6.c/6.d/6.e support altogether (which will eventually happen anyway).
And, yes, the main purpose of the removal would be about broken expectations, as @lizmat stated.
Bringing it pluggable regexes shouldn't be such a big deal, unless I miss something in the grammar.
What I don't see in here is any mention of how the :P5 regex support is implemented, and that does have an impact on what we do, and when it's practical to do it.
The P5 regex support is actually just a compiler frontend, alongside the P6 regex frontend. Both of them produce the same set of AST nodes, and the code that handles those - optimization, compilation, etc. - is identical. So removing it is not removing a second regex engine, but rather just a grammar and set of action methods - which is the simplest component of a regex engine. The savings - especially considering lazy deserialization - would be meager; ditto for compile time.
Moving that code out to a module is certainly more work than leaving it where it is now. And it would also mean we get a module that is coupled to QAST - an implementation detail - which makes it more costly to manage changes to that. Of course, one might be able to instead write something that transpiles into the Raku regex syntax, which is something I've done for another regex dialect. Alas, it's a lot less nice than being able to produce a tree, but that isn't really going to happen in a good way until we're at the point of having macros and the proper compiler API that goes with them. And that's probably the point at which moving P5 support out in a maintainable and architecturally nice way would be possible too.
I'm sympathetic to the argument that it's far easier for folks to just use regexes they find online if they have a PCRE-alike to hand, and that should be a consideration (though that does maybe mean "P5" is not the best name for it). As to "doesn't do all of Perl 5 regexes", that's true, but it'll still be true if it's stuck out in a module. It's also a moving target. We could, however, document what it does support, or something like "includes everything supported by Perl 5.8" or some such.
We could, however, document what it does support, or something like "includes everything supported by Perl 5.8" or some such.
That's the honest truth, but it's also means "support for regexes as implemented by the Perl version released in... 2002".
El mié., 27 nov. 2019 a las 17:42, nxadm ([email protected]) escribió:
We could, however, document what it does support, or something like "includes everything supported by Perl 5.8" or some such.
That's the honest truth, but it's also means "support for regexes as implemented by the Perl version released in... 2002".
Well, as it's mentioned above, it's PCRE, not current Perl implementation. This would become just a documentation issue. I'm OK with that.
That's the honest truth, but it's also means "support for regexes as implemented by the Perl version released in... 2002".
Well, thus my note that maybe "P5" isn't the best name, and what's really valuable here is that they're "Perl compatible", which is what many languages have.
El mié., 27 nov. 2019 a las 17:46, Jonathan Worthington (< [email protected]>) escribió:
That's the honest truth, but it's also means "support for regexes as implemented by the Perl version released in... 2002".
Well, thus my note that maybe "P5" isn't the best name, and what's really valuable here is that they're "Perl compatible", which is what many languages have.
Just "P" for "PCRE"
If you want it to do PCRE then just call it PCRE.
Ok, my personal bottom line for this would be: :P5 stay intact, but we must document what compatibility does it provide explicitly so as to not produce extra expectations.
It is also possible to provide support for pluggable regex engines and it's comparatively easy to be done. Side note: basically, it looks to me like supporting extra adverbs and passing them into a method which would activate corresponsing slang is all what's needed in the grammar. A slang module would then just override the method and activate itself when necessary.
One thing I'm not certain about is if Raku standard has to impose the obligation of supporting :p5 on other implementations? I.e. wouldn't it make sense to move all :p5 tests from roast into rakudo?
I wonder how this discussion affects the rakuast branch: probably just means additional work porting the P5 slang to rakuast. Which we possibly could not want to do. But then we have an issue, as that would mean :P5 would be dropped from 6.c and 6.d as well. :-(
So, we actually facing two options here: either a hard decision of removing P5 from 6.c and 6.d specs; or marking all P5 roast tests as TODO hoping for future implementation of perl regexes. In either case switching to RakuAST will mean breaking backward compatibility.
I'd choose the second path. It's rather hypocritical but is better than breaking the spec itself.
I think there is a lesson to be learned here. There is such a strong pushback and reluctance against removing anything that now when there is a surprising need to pull the trigger we can hardly provide any decent deprecation cycle. The idea of removing P5 regexes has been floating around for a long time, with some good arguments mentioned here, and all this time users could've had a warning saying that it'd be better to rewrite any P5 regex you have (which most users don't).
Anyway, may I share a somewhat unrelated but awesome talk for inspiration? :) https://www.youtube.com/watch?v=BzX4aTRPzno
The problem now is not in undefined deprecation procedures, though we certainly will need to do something about them anyway. Even if we have them I don't think they'd could cover this case as RakuAST could land in the master much sooner than any deprecations period would foresee.
Also, the need to deprecate something in already released language versions is what really disturbs me. Though the little use of P5 in real world code provokes me to keep 6.c and 6.d unchanged and only proclaim that Rakudo will be broken in this area. After all, many specced 6.c features are not implemented yet. The only unpleasant difference is that none of them were intentionally dropped.
I think there is a lesson to be learned here.
I think the lesson has been learned.
with some good arguments mentioned here
None of these arguments even envisioned a RakuAST approach.
all this time users could've had a warning saying that it'd be better to rewrite any P5 regex you have
Possibly. Then again, maybe porting the P5 regex syntax is less troublesome than I thought. I don't think it would need any additional RakuAST::Regex classes. So it would just be a matter of hooking the RakuAST calls into the P5 regex slang, just as it needs to be hooked up into the Raku regex slang.
And since we cannot have a deprecation cycle if we want to land rakuast in 6.e (or whatever we will call it), it means that we will have to port it. Simple as that.
Would it have been easier if we could drop the P5 regex slang? Yes. Does it mean that we now have to invent a time-machine first to make sure we drop the P5 regex slang earlier? No, I think just adding support for the P5 regex slang in the rakuast branch, will be easier to do.
@AlexDaniel This discussion came up AFTER 6.d go released. So we would have been screwed anyway if we wanted to drop support in 6.e.
My preference would be to keep :P5 and ideally to make it perfectly compatible with perl-- but I can certainly see how that could be impractical, and I would be okay with removing it. Second best would be to keep it and carefully document the ways it deviates from perl, though realistically even that could be quite a bit of work. Third best would be to keep it and add a vague warning to the documentation alerting users not to expect very good compatibility.
The worst of all worlds is to let it ride, and let new users trip over the problem before learning about it.
I don't see anything in the existing documentation to warn people off from using it:
The :Perl5 or :P5 adverb switch the Regex parsing and matching to the way Perl regexes behave: ... ... the :Perl5 adverb can be useful when compatibility with Perl is required.
Closing as this discussion is now outdated. And will be continued no doubt in #378.