ParseXS: build an AST for each XSUB
This PR is part of my ParseXS refactoring work. I don't intend for it to be merged until 5.43.1.
This series of about 150 commits changes ExtUtils::ParseXS so that, instead of intermixing parsing and code-emitting for each XSUB, it now parses each XSUB into an Abstract Syntax Tree, and then walks the tree to emit all the C code for that XSUB.
This makes the code generally more understandable and maintainable.
For now it still just discards each tree after the XSUB is parsed; in future work, the AST will be extended so that it holds the whole file (including all the XSUBs) rather than just the current XSUB.
This branch contains six types of commit.
-
For terminal AST nodes for keywords such as FOO, the old
ExtUtils::ParseXS::handle_FOO()
method is removed and a new ExtUtils::ParseXS::Node::FOO class is added with parse() and as_code() methods which copy over the parsing and code-emitting parts of the handle_FOO() method. For a few keywords like INPUT which have values per line, both a Node::FOO and Node::FOO_line class are created, with several FOO_line nodes being children of the FOO node.
Note that doing the modifications for a single keyword often consists in fact of several commits in sequence.
-
For higher-level nodes, a Node::foo class is created with parse() and as_code() methods as before, but the contents of these methods are typically populated by moving the relevant bits of code over from the big ExtUtils::ParseXS::process_file() method.
-
Most of the state fields of the ExtUtils::ParseXS class (especially all the xsub_foo ones) are removed and similar fields added to the various Node subclasses instead.
-
Fixups to ensure that all parse-time code is in parse() methods or associated helper functions, and similarly for as_code().
-
Various bug fixes related to state that should be per-CASE rather than per-XSUB. Some of these bugs were pre-existing, some were introduced during this branch.
-
General tidying-up, fixing code comments, adding POD etc.
On Tue, May 06, 2025 at 10:00:14PM -0700, Tony Cook wrote:
The old code was lax, but as far as I can tell this changes the behaviour when the keyword isn't exactly "ENABLE" or "DISABLE".
That's a good point. I've looked into this a bit further. Apart from the PROTOTYPES keyword, all the other 'ENABLE'-type keywords seem to be used 'properly' in CPAN. So I propose that for those keywords, I make it so that anything other than this following the keyword is an error:
/\s* (ENABLE|DISABLE) \s* $/x
I further propose that we special-case the PROTOTYPES keyword for backwards-compatibility. I suspect that the main reason that people have added this keyword is to silence the "Please specify prototyping behavior" warning. They were then inadvertently relying on the fact that any old unrecognised garbage following this keyword is interpreted as DISABLE, which is the default anyway.
Based on a CPAN grep, this is a count of how many distros contain at least one match of each of these patterns in *.xs files:
595 /^\sPROTOTYPES:\sENABLE\s*$/ 980 /^\sPROTOTYPES:\sDISABLE\s*$/ 12 /^\sPROTOTYPES:\sENABLED\s*$/ 49 /^\sPROTOTYPES:\sDISABLED\s*$/ 11 /^\sPROTOTYPES:\s[Ee]nabled?\s*$/ 31 /^\sPROTOTYPES:\s[Dd]isabled?\s*$/
Only the first pattern above is actually interpreted as enabling prototypes.
There are only 4 instances of a PROTOTYPES line which don't match
/: \s* (ENABLE|DISABLE)D? \s* $/xi
which are:
PerlIO-via/via.xs: PROTOTYPES: ENABLE;
Win32-FindFile: PROTOTYPES: DISABLE;
X-Osd: PROTOTYPES: DISABLES
Net-LibNFS: PROTOTYPES: DISABLESTATVFS64_OFFSET
So I propose that
-
Anything not matching that last regex fails with an error, and we supply fixes for those 4 now-failing distros.
-
Anything other than 'ENABLE' is interpreted as DISABLE (thus maintaining backcompat).
-
anything other than /\s*(ENABLE|DISABLE)\s*$/ issues a warning to the effect that only ENABLE/DISABLE are valid keywords and that the unrecognised value has been interpreted as DISABLE.
In total, this means that we break 4 CPAN distros and make another 103 start to emit a warning during build (but would otherwise continue to work as before). Note that this change will be applied early in the development cycle, so we would have nearly a year for distros to be updated or for us to revert this change.
-- "Foul and greedy Dwarf - you have eaten the last candle." -- "Hordes of the Things", BBC Radio.
PerlIO-via/via.xs: PROTOTYPES: ENABLE;
That's a core module, so we might as well fix it immediately.
As of Sun May 11 14:47:18 UTC 2025, the GH interface for this p.r. is stating, "This branch cannot be rebased due to conflicts."
I decided to explore this. I did a local checkout of the p.r., rebased it on blead, and proceeded to configure and build. For unknown reasons, my machine slowed to a crawl during the compilation of utf8.c. I had to do a hard reboot.
When I recovered, I did a new checkout of the p.r. but did not rebase it on blead. I then configured, built and tested without incident. I called git clean; git rebase blead. I then configured, built and again retested without incident.
I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.
On Sun, May 11, 2025 at 07:53:59AM -0700, James E Keenan wrote:
I decided to explore this. I did a local checkout of the p.r., rebased it on blead, and proceeded to configure and build. For unknown reasons, my machine slowed to a crawl during the compilation of utf8.c. I had to do a hard reboot.
It rebased and built fine for me.
I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.
I've no idea why. My branch only touches files under dist/ExtUtils-ParseXS/, and nothing recent in blead does. I intend to just ignore gitgub's opinion, and merge it manually when the time comes.
-- No man treats a motor car as foolishly as he treats another human being. When the car will not go, he does not attribute its annoying behaviour to sin, he does not say, You are a wicked motorcar, and I shall not give you any more petrol until you go. He attempts to find out what is wrong and set it right. -- Bertrand Russell, Has Religion Made Useful Contributions to Civilization?
So I propose that
Anything not matching that last regex fails with an error, and we supply fixes for those 4 now-failing distros.
Anything other than 'ENABLE' is interpreted as DISABLE (thus maintaining backcompat).
anything other than /\s*(ENABLE|DISABLE)\s*$/ issues a warning to the effect that only ENABLE/DISABLE are valid keywords and that the unrecognised value has been interpreted as DISABLE.
In total, this means that we break 4 CPAN distros and make another 103 start to emit a warning during build (but would otherwise continue to work as before). Note that this change will be applied early in the development cycle, so we would have nearly a year for distros to be updated or for us to revert this change.
This is fine for me.
I'm not sure what's happening, but I suggest we figure out why this ticket is saying that the branch cannot be rebased.
It rebased fine for me too, and the automation isn't adding the "hasConficts" label, so I think we're fine.
I'm done going through this, except for the issues above which are mostly cosmetic, it looks good to me.
I've rebased and pushed an updated version. All the existing commits are unchanged apart from some fixed typos in their commit messages, as pointed out in Tony C's various comments. There are also 7 new commits at the head which address the various issues which Tony raised.
One commit message says:
Unfortunately this string (stored as $xsub->{xsub_decl}{class}) is also
used to generate an autocall line, which could lead to nonsense such as:
RETVAL = new const X::Y();
While combining the const with the class name in a field called class triggered me, the above syntax is valid, assuming RETVAL is declared as const X::Y *, since new takes a type, and a cv-qualifier is part of the type.
It doesn't strike me as particularly useful in an XS context though.
(I don't think any changes are needed here)
Since this pull request affects a library under dist/, it will presumably at some point be uploaded to CPAN for installation against older versions of perl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?
Since this pull request affects a library under
dist/, it will presumably at some point be uploaded to CPAN for installation against older versions ofperl. Do we have a way of testing whether or not this EUPXS will then pass its own test suite?
CI tests all dist modules against 5.38, 5.24, 5.18, 5.10 threaded and non-threaded builds on Linux, and against the system perl on Mac OS. See https://github.com/Perl/perl5/actions/runs/16097800911/job/45423184974 for example