langium Inconsistent validation behavior on cross-reference alternatives

For alternatives of cross-references, the validator accepts some grammars even though they won't work at runtime. Other grammars get rejected even though I assume they would work at runtime.

Our scenario:

The reference target can be ProcedureTypeA or ProcedureTypeB
References can be simple names or fully qualified names

Attempt 1:

ProcedureCall:
    procedure=([ProcedureTypeA:SIMPLE_NAME] | [ProcedureTypeA:FULLY_QUALIFIED_NAME] | [ProcedureTypeB:SIMPLE_NAME] | [ProcedureTypeB:FULLY_QUALIFIED_NAME])

This is rejected by the validator: "Mixing a cross-reference with other types is not supported. Consider splitting property "procedure" into two or more different properties." I think it's okay that the validator rejects it with that error message.

Attempt 2:

ProcedureCall:
    (procedure=[ProcedureTypeA:SIMPLE_NAME] | procedure=[ProcedureTypeA:FULLY_QUALIFIED_NAME] | procedure=[ProcedureTypeB:SIMPLE_NAME] | procedure=[ProcedureTypeB:FULLY_QUALIFIED_NAME])

This is accepted by the validator, but it doesn't work at runtime. I think it is unsupported by Langium. See this unreachable code in getReferenceType() of ast.ts:

        switch (referenceId) {
            case 'ProcedureCall:procedure': {
                return ProcedureTypeA ;
            }
            case 'ProcedureCall:procedure': {
                return ProcedureTypeB;
            }

Attempt 3:

type ProcedureCallTarget= ProcedureTypeA | ProcedureTypeB;

ProcedureCall:
    (procedure=[ProcedureCallTarget:SIMPLE_NAME] | procedure=[ProcedureCallTarget:FULLY_QUALIFIED_NAME])

This is accepted by the validator and seems to work at runtime, too.

Attempt 4: Trying to write the above in a somewhat more compact form:

type ProcedureCallTarget = ProcedureTypeA | ProcedureTypeB;
ProcedureCall:
 (procedure=([ProcedureCallTarget:SIMPLE_NAME] | [ProcedureCallTarget:FULLY_QUALIFIED_NAME]))

This gets rejected by the validator: "Mixing a cross-reference with other types is not supported. Consider splitting property "procedure" into two or more different properties."

I don't quite understand why it gets rejected. Semantically, it seems to be the same as in attempt 3. The construct that Langium doesn't seem to support is mixing reference types (e.g. ProcedureTypeA | ProcedureTypeB). But different terminal types (e.g. SIMPLE_NAME | FULLY_QUALIFIED_NAME) don't seem to be pose a problem for the Langium runtime.

Langium version: 3.4.0 Package name: https://registry.npmjs.org/langium/-/langium-3.4.0.tgz

The current behavior: Validator accepts attempt 2; Validator rejects attempt 4.

The expected behavior: Validator should reject attempt 2; Validator should accept attempt 4.

Mar 25 '25 13:03 dgDSA

did you try

PCName: SIMPLE_NAME|FULLY_QUALIFIED_NAME;

ProcedureCall: (procedure=[ProcedureCallTarget:PCName])

Mar 25 '25 14:03 cdietrich

Hey @dgDSA,

this is (mostly) working as designed. See also reference unions docs. The issue is that an assignment such as procedure=([ProcedureTypeA:SIMPLE_NAME] | [ProcedureTypeB:SIMPLE_NAME]) is ambiguous from a parser perspective. I.e. it's impossible to identify whether the parser is meant to consume a reference for ProcedureTypeA or one for ProcedureTypeB. Your "Attempt 3" is exactly how it is meant to be done.

However, your attempt 4 also should work as expected - the validation is likely overly eager to report an error in that instance (likely because we never anticipated it to be used like that).

Mar 25 '25 14:03 msujew

@msujew Thanks for the link to the reference docs. Yes, your explanation of the ambiguity makes sense.

@cdietrich Thanks for the great suggestion, that makes it a lot more elegant.

As a reference for others, this is now the elegant grammar that works for us:

ReferenceByName returns string:
    SIMPLE_NAME | FULLY_QUALIFIED_NAME;

type ProcedureCallTarget = ProcedureTypeA | ProcedureTypeB;

ProcedureCall:
    (procedure=[ProcedureCallTarget:ReferenceByName])

Mar 25 '25 14:03 dgDSA