antlr-php-runtime Allowed memory size of 134217728 bytes exhausted

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in /home/parser-generator/vendor/antlr/antlr4-php-runtime/src/Atn/ParserATNSimulator.php on line 2036

Hi, I am trying to use the PHP target, but it throws the above error, while parsing a file using a grammar. The same file and same grammar works fine on the Java target. If I remove the PHP process memory limit, it succeeds after a few seconds, but thats not right, as the file is very simple. Also, it is just a particular rule that creates this memory overuse, every other rule seems to be parsing fine.

I tried to make it so you can relatively easy reproduce the error. The repo is https://github.com/npgeorgiou/say, and the PHP target experiment is in the parser-generator directory, which also has instructions on how to reproduce the error: https://github.com/npgeorgiou/say/tree/main/parser-generator

Let me know if I can do anything else to make it easier for you.

Apr 18 '21 15:04 npgeorgiou-zz

Ok, I have more info. This memory bug happens when a subrule starts with itself. For example, here I have a section of my grammar:

expression: |SAY expression # Say |EXCLAMATION expression # Prefix |expression EXCLAMATION # Postfix ;

say !foo will be parsed without issues say foo! will run out of memory or, if I give it unlimited memory, will be very very slow. Of course this cannot work for big files that contain a few of these expresions.

Apr 18 '21 20:04 npgeorgiou-zz

Hi @npgeorgiou!

Thank you for reporting the issue. It seems like a recursion-related bug. I don't have time to track down the problem right now. If you have time to investigate this, PR’s are always welcome.

Apr 18 '21 20:04 marcospassos

I would love to, but I am not smart enough for that. What I can offer to any brave adventurer is these:

A grammar:

expression:
    IDENTIFIER                                    # Identifier
    |param_list? ARROW expression+                # Function_literal
    |EXCLAMATION expression                       # Prefix
    |expression EXCLAMATION                       # Postfix
    |SAY expression                               # Say
    |<assoc=right> expression ASSIGN expression   # Assignment
    |PO expression PC                             # P_expression
;

param_list: params+=function_param (COMMA params+=function_param)* COMMA?;
function_param: IDENTIFIER (ASSIGN defaultValue=expression)?;

A file:

is_leap << (year ->
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
    say foo!
)

Traveller, observe how the postfix **foo!**s are creating this memory error. The more of them, the slower it goes until it runs out of memory.

Also, observe how changing the postixes to prefixes (!foos) eliminates the bug. Then, oberve how taking the **foo!**s out of the is_leap function body eliminates the bug as well. Finally, oberve how, weirdly enough changing the

function_param: IDENTIFIER (ASSIGN defaultValue=expression)?;
to
function_param: IDENTIFIER;

improves the situation, although with enough **foo!**s it appears again.

May the gods of the crossroads and the in-betweens be with you.

Apr 18 '21 21:04 npgeorgiou-zz

It seems I ran into a similar bug with the aql grammar with input for.aql. I'm trying to track it down, but also ran into #33, which makes the tracing impossible to use.

Dec 17 '22 15:12 kaby76

In the driver, I turned on the new "tracing" features in v4.11.2 (current dev tip). I also edited the source to output a newline for the standard trace parser visitor.

PHP:

(before parse call)
$parser->setTrace(true);
Antlr\Antlr4\Runtime\Atn\ParserATNSimulator::$traceAtnSimulation = true;

CSharp:

(before parse call)
parser.Trace = true;
ParserATNSimulator.trace_atn_sim = true;

With tracing set, from the command line, the parse completes for PHP! Without trace set, the first call to AdaptivePredict() does not complete, and leads to out of memory exception. Note, regardless of trace set or not, in the XDebug/PHPStorm debugger, the run does not terminate! From the command line, with trace, there are ATN set differences between CSharp, which works fine, and PHP, which terminates because it's at the command line.

As for the diffs, it happens on the first "addDFAState".

PHP:

addDFAState new 0:[(297,1,[71 68 $]), (363,1,[225 126 74 68 $]), (226,1,[126 74 68 $]), (363,1,[290 121 68 $]), (291,1,[121 68 $]), (79,1,[68 $]), (297,2,[155 68 $]), (363,2,[225 126 158 68 $]), (226,2,[126 158 68 $]), (162,2,[68 $]), (169,2,[68 $]), (181,2,[68 $]), (193,2,[68 $]), (204,2,[68 $])]

CSharp:

addDFAState new 0:[(297,1,[71 68 $]), (363,1,[[225 126 74 68 $, 290 121 68 $]]), (226,1,[126 74 68 $]), (291,1,[121 68 $]), (79,1,[68 $]), (297,2,[155 68 $]), (363,2,[225 126 158 68 $]), (226,2,[126 158 68 $]), (162,2,[68 $]), (169,2,[68 $]), (181,2,[68 $]), (193,2,[68 $]), (204,2,[68 $])]

I don't know enough about the output to understand this, but it seems one is an aggregate because it contains two '[' in CSharp, but only one '[' for PHP. This sounds like the runtime is written with a misinterpretation of the data structure.

@parrt Is there a detailed description of this output?

I plan to add to the output the name of the type it is printing out (ATN, ATNSet, etc.). Again, my guess is that there is a data structure that isn't conforming to the expected type.

All of this point to some serious problems with PHP:

The ATN sets differ when executed from the command line.
PHP in the debugger vs command line differ in termination.

Dec 17 '22 17:12 kaby76

Since I can't tell what a [ opens for a type (ATNConfig, ArrayPredicateConfig, ....), I decided to add identifiers to the "ToString()" output methods to tell me the object type it is trying to print out. I now am starting to see WTH is going on.

PHP:

addDFAState new 0:[atncs(ac297,1,[ac71 68 $]ac)ac, (ac363,1,[ac225 126 74 68 $]ac)ac, (ac226,1,[ac126 74 68 $]ac)ac, (ac363,1,[ac290 121 68 $]ac)ac, (ac291,1,[ac121 68 $]ac)ac, (ac79,1,[ac68 $]ac)ac, (ac297,2,[ac155 68 $]ac)ac, (ac363,2,[ac225 126 158 68 $]ac)ac, (ac226,2,[ac126 158 68 $]ac)ac, (ac162,2,[ac68 $]ac)ac, (ac169,2,[ac68 $]ac)ac, (ac181,2,[ac68 $]ac)ac, (ac193,2,[ac68 $]ac)ac, (ac204,2,[ac68 $]ac)ac]atncs

CSharp:

addDFAState new 0:[atncs(ac297,1,[ac71 68 $]ac)ac, (ac363,1,[ac[apc225 126 74 68 $, 290 121 68 $]apc]ac)ac, (ac226,1,[ac126 74 68 $]ac)ac, (ac291,1,[ac121 68 $]ac)ac, (ac79,1,[ac68 $]ac)ac, (ac297,2,[ac155 68 $]ac)ac, (ac363,2,[ac225 126 158 68 $]ac)ac, (ac226,2,[ac126 158 68 $]ac)ac, (ac162,2,[ac68 $]ac)ac, (ac169,2,[ac68 $]ac)ac, (ac181,2,[ac68 $]ac)ac, (ac193,2,[ac68 $]ac)ac, (ac204,2,[ac68 $]ac)ac]atncs

Notice the missing "[apc" tag. Whatever object PHP is printing, it is NOT an ArrayPredictionContext in PHP (but it is in CSharp) because I specifically modified the toString() method with tags. I will now debug toString() and see what the heck the object is.

@parrt Please, please, please add some kind of tagging system to note what type of object is being printed in the ATN trace output. I cannot tell what '[' opens.

Dec 17 '22 18:12 kaby76

hi @kaby76 thanks for the heads up. As usual an excellent analysis. The issue is that there are lots of different types that represent the same abstract concept of context. Not a bad idea, but the real issue here is that we don't have find enough granularity on the simulation trace. Is all of the output perfect up until that add the first DFA state? If so, then we need to add more output to the targets so that they identify why it is not generating the right stuff. Given the grammar, I can see that it is the left recursive stuff that's the problem. That will involve the precedence semantic predicates.

My head is stuck in something else at the moment so I don't have time to dig into this but maybe this gives you a bit of a clue? There's definitely a flaw in the ATN sim here. You might try reducing the offending rule to have one left recursive call and one non-recursive call. Also try using the recursive rule as the start symbol and then have a symbol above it. That could give a clue or at least a smaller test set.

Dec 17 '22 19:12 parrt

The first DFAState added between C# and PHP may be different.

C# in VC2022, for "D" at this line, first time hit, for dev branch, for.aql file input. The "configSet.configs" field doesn't even contain the same number of items. In C#, it's 13 elements.

PHP in PHPStorm, at this line, the configSet.configs field has 14 items. This is bad.

Dec 17 '22 22:12 kaby76

CSharp and PHP code look completely different--missing "else" in PHP code. But, this is where the ArrayPredictionContext is create in CSharp, but it's never called in PHP.

So, there's more than one error likely.

Dec 17 '22 23:12 kaby76

Found the problem. Or, in all likelihood, it may be only one of several.

In ATNConfigSet, there is a table called configLookup. It is a Dictionary<ATNConfig, ATNConfig> in C#, but a Set of ATNConfig in PHP. (Note, we've seen this difference in implementation before, between other targets.)

In C#, the code calls "GetOrAdd()", which is here. That table has a special comparer and hash function set for the class. The hash function that is executed is in class ConfigEqualityComparer, which has code that uses three fields of the ATNConfig.

Over in PHP, the code uses a generic Set implementation for $configLookup. That set is allocated here. Notice the hash function defined in this anonymous class here. That calls the standard hash function for ATNConfig. That computes the hash value using four fields--which is wrong!

I've verified that the call stack is indeed calling the wrong hash function for ATNConfig for this table in ParserATNSimulator. This is quite serious.

Dec 18 '22 04:12 kaby76

hahah. it's ALWAYS the hash function. @marcospassos looks like there might be an issue. We need a map from X->X not a set so we can reuse the same instance.

Dec 18 '22 04:12 parrt

I have an initial set of changes that seems to get past some of the parser tracing diffs. diffs.txt

Essentially, as per note by @parrt I replaced the Set type for $configLookup in ATNConfigSet.php with Map as done with Java and CSharp. I also corrected the hash function/comparsion class for the map to be identical in CSharp. The Map interface needed some changes in the API for getOrAdd() and isEmpty()--which is just a hack to move past the first problem. It also contains some "print()'s" that end trace output with a new line ("echo()" does not write a newline!!). https://github.com/antlr/antlr-php-runtime/issues/33. People can wordsmith the "correct" design.

The trace is starting to now look better, getting past the diff with "addDFAState" in the output. But I still notice diffs. There are more problems.

Dec 19 '22 03:12 kaby76

The trace output for "for.aql" now diverges with the first "mergeArrays" line.

CSharp:

mergeArrays a=[385 232 126 74 68 $, 467 396 232 126 74 68 $],b=[311 232 126 74 68 $] -> [311 232 126 74 68 $, 385 232 126 74 68 $, 467 396 232 126 74 68 $]

PHP:

mergeArrays a=[385 232 126 74 68 $, 467 396 232 126 74 68 $],b=[311 232 126 74 68 $] -> M

"M". Really? The code seems wrong. That formatting code should be {M} not M. But, it should even be here. This line fails..

Dec 19 '22 09:12 kaby76

I fixed the missing "else" problem (https://github.com/antlr/antlr4/blob/539ffaf63d312d38c98eb57099a4b6a735233fb8/runtime/CSharp/src/Atn/PredictionContext.cs#L212) and the string interpolation formatting problem in PHP, but that didn't fix the diff. There's something more wrong with merging of the prediction cache arrays.

Dec 19 '22 10:12 kaby76

Found it. This code in PHP is wrong. In CSharp and java, these two arrays are allocated with nulls of the expected size, which is 3. After running through the code that assign to these arrays, this test in PHP fails because k=2 but the size of $mergeParents is 2, not 3. The length of the array should be 3 as in Java and CSharp. The solution is to assign the array with the correct number of nulls. Fixed this, but there are still more problems.

Sorry for the following flaming, but.... This is endless. People really need to go side by side with multiple debuggers/multiple targets, single step, and check their code. Granted, PHPStorm is terrible. Just setting up PHP for debugging required installing Xdebug with a manual .dll copy then modifying php.ini. I guess there's no global cache like in C#, or classpath as in Java, for PHP. Terrible. In PHPStorm, there doesn't seem to be a "Watch" pane as in VS2022, which automatically computes an expression when a breakpoint is reached. I have to manually evaluate "toString()" on all these data structures. Really slows everything down. Does anyone if there's a "Watch" pane in PHPStorm?

Dec 19 '22 11:12 kaby76

Hi @kaby76, great debugging work!

Most of these problems seem to be related to the PHP target being ported from JavaScript by the developer who started it. Therefore, these problems may exist there too.

Do you intend to open a PR with the fixes? I have an extensive test suite here that I want to test against. Also, I want to run some benchmarks to understand the impact on performance (positive or negative).

Granted, PHPStorm is terrible. Just setting up PHP for debugging required installing Xdebug with a manual .dll copy then modifying php.ini. I guess there's no global cache like in C#, or classpath as in Java, for PHP. Terrible. In PHPStorm, there doesn't seem to be a "Watch" pane as in VS2022, which automatically computes an expression when a breakpoint is reached. I have to manually evaluate "toString()" on all these data structures. Really slows everything down. Does anyone if there's a "Watch" pane in PHPStorm?

I use both IDEs and they have feature pairing. Here's how you add a watcher:

https://user-images.githubusercontent.com/943036/208450505-368e7580-e0ac-4937-ba96-ee43eaddbadc.mov

Dec 19 '22 14:12 marcospassos

Excellent! That watch works.

Not sure on the PR yet. Let's find all the bugs till the traces are exactly the same. I'll get the diffs on my mods, and post them here.

Looks like there's another problem here. This code is a Dictionary<PredictionContext, PredictionContext> over in C#. That means it calls the GetHashCode() and Equals() methods per PredictionContext class, which there's a whole bunch of them. This code in PHP doesn't call any of the hash() or equals() methods. I verified that nothing is called in PHP using the debugger. It must just do a pointer comparison for "contains()". The PHP doc for contains() doesn't say a thing about hash values or equality. Not good.

Dec 19 '22 14:12 kaby76

The PHP doc for contains() doesn't say a thing about hash values or equality. Not good.

The SplObjectStorage does not rely on value equality but on identity. So two objects are only considered the same if they are the same instance.

Dec 19 '22 18:12 marcospassos

Lot's of wrong. Someone got pointer comparisons vs equals() vs hash value compares all wrong. What a mess.

Incorrect pointer compare, should be equals(). PHP CSharp
(ditto). PHP CSharp
Missing hash value check. Not here in PHP, but is here in CSharp
Comparing arrays of ints, but if equal, returns FALSE WHEN IT SHOULD NOT RETURN. IF THE ARRAYS ARE DIFFERENT, RETURN FALSE. JFC.! PHP

Dec 19 '22 18:12 kaby76

The SplObjectStorage does not rely on value equality but on identity. So two objects are only considered the same if they are the same instance.

It does not work because the comparison does not work. It does not compare ArrayPredictionContext correctly. But, you can debug this and prove to yourself this does not work: The ATN computation diverges between CSharp/Java and PHP.

Dec 19 '22 18:12 kaby76

I'm not saying that is right or wrong. I'm just contributing to your understanding of how it currently works. As soon as you open a PR or report all these differences I'll help fix the bugs.

Dec 19 '22 18:12 marcospassos

Also, keep in mind that the evolution of the targets is not in sync. Targets have different numbers of contributors and development time. The CS target is much more mature and battle-tested than PHP.

Anyway, I'm glad you're finding theses bugs. I'd bet it's affecting the performance as well.

Dec 19 '22 18:12 marcospassos

OK, thanks @marcospassos .

OK, inching closer!! I'm up to line ~1208 in the ATN parser trace comparison. Lots more now working, and it's starting to get exciting, I guess.

Divergence here PHP:

computeReachSet [(505,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (457,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (458,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (503,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (468,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (474,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (462,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (369,1,[[385 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $], 467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]]), (464,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (371,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (375,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (382,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (389,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (505,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (457,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (458,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (503,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (468,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (474,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (462,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (369,2,[[385 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $], 467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]]), (464,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (371,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (375,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (382,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (389,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]])] -> [(398,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (401,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (404,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (407,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (410,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (414,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (420,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (423,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (426,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (429,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (432,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (435,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (438,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (445,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (472,1,[467 396 232 126 74 68 $]), (472,1,[467 396 232 126 74 68 $]), (398,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (401,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (404,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (407,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (410,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (414,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (420,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (423,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (426,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (429,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (432,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (435,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (438,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (445,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (472,2,[467 396 232 126 158 68 $]), (472,2,[467 396 232 126 158 68 $])]

CSharp:

computeReachSet [(505,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (457,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (458,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (503,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (468,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (474,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (462,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (369,1,[[385 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $], 467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]]), (464,1,[396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (371,1,[467 396 452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (375,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (382,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (389,1,[452 [448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (505,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (457,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (458,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (503,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (468,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (474,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (462,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (369,2,[[385 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $], 467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]]), (464,2,[396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (371,2,[467 396 452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (375,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (382,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (389,2,[452 [448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]])] -> [(398,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (401,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (404,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (407,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (410,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (414,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (420,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (423,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (426,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (429,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (432,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (435,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (438,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (445,1,[[448 471 467 396 232 126 74 68 $, 471 467 396 232 126 74 68 $]]), (472,1,[467 396 232 126 74 68 $]), (398,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (401,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (404,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (407,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (410,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (414,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (420,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (423,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (426,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (429,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (432,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (435,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (438,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (445,2,[[448 471 467 396 232 126 158 68 $, 471 467 396 232 126 158 68 $]]), (472,2,[467 396 232 126 158 68 $])]

Here's the current source code diffs. diffs.txt

Dec 19 '22 18:12 kaby76

The parse ATN trace debugging output has taken me far, but not far enough. However, the feature has been absolutely invaluable in getting this far. Thank you @parrt !!

I added more debugging output and have traced the computation to here. The after value of "configs" from the add() is not the same across targets. I suspect that the mergeCache structure isn't working. That's a little surprising because it does use Map and not Set.

Dec 20 '22 12:12 kaby76

Ah, this test isn't right:

	if ($existing->equals($config)) { <<<<<<<
		$this->cachedHashCode = null;

Over in C#, the code is this:

	if (existing == config)
	{ // we added this new one
		cachedHashCode = -1;

In the C# code, the ==-operator is not overriden and the default just checks whether the references (aka pointers) are the same.

So, calling equals() is wrong. It should be '===' or '=='.

And, it looks like it now working! Trees look good.

Code changes: diffs.txt

Dec 20 '22 22:12 kaby76

Can you open a PR with the fixes?

Dec 21 '22 00:12 marcospassos

I can certainly do that. I would like to spend some time and test it on everything grammar in grammars-v4. I want to really check out the parser ATN tracing an all.

Dec 21 '22 00:12 kaby76

The trees are the same, but there's some unusual diffs in the parser ATN traces with some numbers.

addDFAState new 0:[(398,1,[$]), (401,1,[$]), (404,1,[$]), (407,1,[$]), (410,1,[$]), (414,1,[$]), (420,1,[$]), (423,1,[$]), (426,1,[$]), (429,1,[$]), (432,1,[$]), (435,1,[$]), (438,1,[$]), (445,1,[$]), (99,2,[$],up=1), (369,2,[311 305 103 $],up=1), (309,2,[305 103 $],up=1), (104,2,[$],up=1), (363,2,[$],up=3), (242,2,[$],up=2), (220,2,[[116 $, 283 $, 348 $]],up=2), (291,2,[$],up=3), (63,2,[$],up=3), (380,2,[$],up=2), (171,2,[$],up=1), (183,2,[$],up=1), (195,2,[$],up=1), (206,2,[$],up=1), (208,2,[$],up=1), (210,2,[$],up=1), (279,2,[235 $],up=1), (312,2,[125 $],up=2), (250,2,[$],up=2), (270,2,[$],up=3), (349,2,[149 $],up=3), (226,2,[$],up=3), (162,2,[$],up=3), (169,2,[$],up=3), (181,2,[$],up=3), (193,2,[$],up=3), (204,2,[$],up=3), (284,2,[[254 $, 258 $]],up=1), (255,2,[$],up=1), (314,2,[$],up=1), (319,2,[$],up=1), (327,2,[$],up=1), (358,2,[338 $],up=1), (351,2,[$],up=1), (336,2,[$],up=2), (398,2,[$],up=7), (401,2,[$],up=7), (404,2,[$],up=7), (407,2,[$],up=7), (410,2,[$],up=7), (414,2,[$],up=7), (420,2,[$],up=7), (423,2,[$],up=7), (426,2,[$],up=7), (429,2,[$],up=7), (432,2,[$],up=7), (435,2,[$],up=7), (438,2,[$],up=7), (445,2,[$],up=7), (387,2,[$],up=2), (416,2,[$],up=3), (443,2,[$],up=4), (472,2,[$],up=8), (489,2,[$],up=9), (486,2,[$],up=10)],dipsIntoOuterContext
execATN decision 63, DFA state 0:[(398,1,[$]), (401,1,[$]), (404,1,[$]), (407,1,[$]), (410,1,[$]), (414,1,[$]), (420,1,[$]), (423,1,[$]), (426,1,[$]), (429,1,[$]), (432,1,[$]), (435,1,[$]), (438,1,[$]), (445,1,[$]), (99,2,[$],up=1), (369,2,[311 305 103 $],up=1), (309,2,[305 103 $],up=1), (104,2,[$],up=1), (363,2,[$],up=3), (242,2,[$],up=2), (220,2,[[116 $, 283 $, 348 $]],up=2), (291,2,[$],up=3), (63,2,[$],up=3), (380,2,[$],up=2), (171,2,[$],up=1), (183,2,[$],up=1), (195,2,[$],up=1), (206,2,[$],up=1), (208,2,[$],up=1), (210,2,[$],up=1), (279,2,[235 $],up=1), (312,2,[125 $],up=2), (250,2,[$],up=2), (270,2,[$],up=3), (349,2,[149 $],up=3), (226,2,[$],up=3), (162,2,[$],up=3), (169,2,[$],up=3), (181,2,[$],up=3), (193,2,[$],up=3), (204,2,[$],up=3), (284,2,[[254 $, 258 $]],up=1), (255,2,[$],up=1), (314,2,[$],up=1), (319,2,[$],up=1), (327,2,[$],up=1), (358,2,[338 $],up=1), (351,2,[$],up=1), (336,2,[$],up=2), (398,2,[$],up=7), (401,2,[$],up=7), (404,2,[$],up=7), (407,2,[$],up=7), (410,2,[$],up=7), (414,2,[$],up=7), (420,2,[$],up=7), (423,2,[$],up=7), (426,2,[$],up=7), (429,2,[$],up=7), (432,2,[$],up=7), (435,2,[$],up=7), (438,2,[$],up=7), (445,2,[$],up=7), (387,2,[$],up=2), (416,2,[$],up=3), (443,2,[$],up=4), (472,2,[$],up=8), (489,2,[$],up=9), (486,2,[$],up=10)],dipsIntoOuterContext, LA(1)=='..'<64> line 1:12

vs

addDFAState new 0:[(398,1,[$]), (401,1,[$]), (404,1,[$]), (407,1,[$]), (410,1,[$]), (414,1,[$]), (420,1,[$]), (423,1,[$]), (426,1,[$]), (429,1,[$]), (432,1,[$]), (435,1,[$]), (438,1,[$]), (445,1,[$]), (99,2,[$],up=1073741825), (369,2,[311 305 103 $],up=1073741825), (309,2,[305 103 $],up=1073741825), (104,2,[$],up=1073741825), (363,2,[$],up=1073741827), (242,2,[$],up=1073741826), (220,2,[[116 $, 283 $, 348 $]],up=1073741826), (291,2,[$],up=1073741827), (63,2,[$],up=1073741827), (380,2,[$],up=1073741826), (171,2,[$],up=1073741825), (183,2,[$],up=1073741825), (195,2,[$],up=1073741825), (206,2,[$],up=1073741825), (208,2,[$],up=1073741825), (210,2,[$],up=1073741825), (279,2,[235 $],up=1073741825), (312,2,[125 $],up=1073741826), (250,2,[$],up=1073741826), (270,2,[$],up=1073741827), (349,2,[149 $],up=1073741827), (226,2,[$],up=1073741827), (162,2,[$],up=1073741827), (169,2,[$],up=1073741827), (181,2,[$],up=1073741827), (193,2,[$],up=1073741827), (204,2,[$],up=1073741827), (284,2,[[254 $, 258 $]],up=1073741825), (255,2,[$],up=1073741825), (314,2,[$],up=1073741825), (319,2,[$],up=1073741825), (327,2,[$],up=1073741825), (358,2,[338 $],up=1073741825), (351,2,[$],up=1073741825), (336,2,[$],up=1073741826), (398,2,[$],up=1073741831), (401,2,[$],up=1073741831), (404,2,[$],up=1073741831), (407,2,[$],up=1073741831), (410,2,[$],up=1073741831), (414,2,[$],up=1073741831), (420,2,[$],up=1073741831), (423,2,[$],up=1073741831), (426,2,[$],up=1073741831), (429,2,[$],up=1073741831), (432,2,[$],up=1073741831), (435,2,[$],up=1073741831), (438,2,[$],up=1073741831), (445,2,[$],up=1073741831), (387,2,[$],up=1073741826), (416,2,[$],up=1073741827), (443,2,[$],up=1073741828), (472,2,[$],up=1073741832), (489,2,[$],up=1073741833), (486,2,[$],up=1073741834)],dipsIntoOuterContext
execATN decision 63, DFA state 0:[(398,1,[$]), (401,1,[$]), (404,1,[$]), (407,1,[$]), (410,1,[$]), (414,1,[$]), (420,1,[$]), (423,1,[$]), (426,1,[$]), (429,1,[$]), (432,1,[$]), (435,1,[$]), (438,1,[$]), (445,1,[$]), (99,2,[$],up=1073741825), (369,2,[311 305 103 $],up=1073741825), (309,2,[305 103 $],up=1073741825), (104,2,[$],up=1073741825), (363,2,[$],up=1073741827), (242,2,[$],up=1073741826), (220,2,[[116 $, 283 $, 348 $]],up=1073741826), (291,2,[$],up=1073741827), (63,2,[$],up=1073741827), (380,2,[$],up=1073741826), (171,2,[$],up=1073741825), (183,2,[$],up=1073741825), (195,2,[$],up=1073741825), (206,2,[$],up=1073741825), (208,2,[$],up=1073741825), (210,2,[$],up=1073741825), (279,2,[235 $],up=1073741825), (312,2,[125 $],up=1073741826), (250,2,[$],up=1073741826), (270,2,[$],up=1073741827), (349,2,[149 $],up=1073741827), (226,2,[$],up=1073741827), (162,2,[$],up=1073741827), (169,2,[$],up=1073741827), (181,2,[$],up=1073741827), (193,2,[$],up=1073741827), (204,2,[$],up=1073741827), (284,2,[[254 $, 258 $]],up=1073741825), (255,2,[$],up=1073741825), (314,2,[$],up=1073741825), (319,2,[$],up=1073741825), (327,2,[$],up=1073741825), (358,2,[338 $],up=1073741825), (351,2,[$],up=1073741825), (336,2,[$],up=1073741826), (398,2,[$],up=1073741831), (401,2,[$],up=1073741831), (404,2,[$],up=1073741831), (407,2,[$],up=1073741831), (410,2,[$],up=1073741831), (414,2,[$],up=1073741831), (420,2,[$],up=1073741831), (423,2,[$],up=1073741831), (426,2,[$],up=1073741831), (429,2,[$],up=1073741831), (432,2,[$],up=1073741831), (435,2,[$],up=1073741831), (438,2,[$],up=1073741831), (445,2,[$],up=1073741831), (387,2,[$],up=1073741826), (416,2,[$],up=1073741827), (443,2,[$],up=1073741828), (472,2,[$],up=1073741832), (489,2,[$],up=1073741833), (486,2,[$],up=1073741834)],dipsIntoOuterContext, LA(1)=='..'<64> line 1:12

up=1073741825 vs up=1??

Looks like I need to fix equivalent()--build failed.

Dec 21 '22 01:12 kaby76

1073741825 in hex = 0x40000001. Where do we see that in the code? here.

The problem is that in C#, "int" is 32 bits. https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/integral-numeric-types#characteristics-of-the-integral-types

In PHP, it's 64-bits (or depends on your OS). https://www.w3schools.com/php/php_numbers.asp#:~:text=PHP%20Integers&text=An%20integer%20data%20type%20is,the%20limit%20of%20an%20integer.

The constant is wrong, but changing that doesn't fix the diffs in the numbers seen in the trace. Something more here....

Dec 21 '22 08:12 kaby76

There is actually an error in the code w.r.t. the use of reachesIntoOuterContext and the associated accessor that uses a bitmap SUPPRESS_PRECEDENCE_FILTER.

Over in Java, there are several locations that very carefully choose the accessor or the variable, which over in PHP are incorrect.

Java ATNConfigSet.add() Must be accessor with bitmap.
Java ParserATNSimulator.getAltThatFinishedDecisionEntryRule() Must be accessor with bitmap.
ATNConfig.toString().

In PHP, this reference wrong:

PHP ATNConfigSet.add() This should be through accessor function that uses the bitmap. bit instead uses the raw field, without any bitmap adjustments.

It turns out that several other targets get it wrong, including CSharp.

Dec 21 '22 16:12 kaby76