perl5
perl5 copied to clipboard
Finish making ParseXS build an AST and fix some bugs
This big branch finishes off the work done in my earlier merge commit v5.43.0-169-g195fee3008 from July 2025, which refactored ParseXS so that each XSUB was parsed into an Abstract Syntax Tree (AST).
This branch extends that work so that the whole XS file is now compiled into a single AST (with all XSUBs embedded somewhere in that tree), and any code generation takes place after all parsing is done. This opens the possibility in the future of, for example, varying what boilerplate C is added to the start of the generated C file based on the nature of the parsed XSUBs: previously, all possible boilerplate had to be emitted, because it couldn't be known in advance what would be needed.
Part of this refactoring has involved moving any non-low-level processing out of fetch_para(). For example, TYPEMAP keywords were formerly entirely processed by fetch_para(); now it has just enough logic to find the matching "EOF" heredoc line and return the TYPEMAP block as a single paragraph, which is then processed just like any other keyword. In fact fetch_para() has been heavily refactored to make it simpler and more regular.
The old parse stack, {$pxs->{XS_parse_stack}}, has been eliminated. The state it used to maintain (mainly concerned with #if/#else/#endif nesting for handling duplicate XSUBs) is now maintained by the presence of Node::cpp_scope nodes in the AST, which delineate sequences of nodes which are all within the same branch of such an #if.
This branch also fixes some bugs along the way; some fixes were just by-products of the new way of parsing things; while others were ones that were spotted while trying to understand what the old code did.
-
The "duplicate XSUB" analysing alluded to above didn't handle #elif correctly.
-
False positives have been eliminated for the "duplicate function definition" warning. This fixes GH 19661. Now the warning is only given for XSUBs with the same name appearing strictly within the same branch of an #if/#else/#endif. This might generate some false negatives (i.e. it doesn't warn when it should) but the problem will be detected by the C compiler eventually anyway.
-
Handle POD correctly if it extends to the last line of the file.
-
Line continuations ("\\n") are now correctly handled if they occur on the line immediately after a POD or TYPEMAP section.
There are also some visible changes in behaviour:
-
The sequencing of XSubPPtmpAAA, XSubPPtmpAAB, etc guard defines may change. These are defines added to indicate different branches of an #if/#else/#endif. The functionality hasn't changed. In theory some XS code could break if it was testing for particular defines; but since these are an internal undocumented implementation detail, it shouldn't be relying on it.
-
Similarly,the exact placing of the '#define XSubPPtmpAAA 1' may alter slightly.
-
The "#else/elif/endif without #if in this function" warning no longer includes the hint "(precede it with a blank line...)" because the condition that was used to determine whether to add the hint is no longer easy to calculate.
-
The newXS() boot calls for per-package 'Foo::Bar::()' methods are emitted slightly later in the boot code now. This should make no functional difference.
-
syntax errors in MODULE and TYPEMAP keyword lines are now detected. So bad MODULE/TYPEMAP lines are now reported as such; previously they would be silently interpreted as non-keyword lines and thus generate some weird error message about a bad XSUB start or similar.
- This set of changes requires a perldelta entry, and i'll write one later.