otp icon indicating copy to clipboard operation
otp copied to clipboard

Misleading warnings from Dialyzer

Open TD5 opened this issue 8 months ago • 2 comments

Describe the bug

Consider the type:

-type m() :: #{
    % An entry with quite a general key first in the map's association list
    atom() => binary(),
    % Singleton atom key entry later
    maybe_bin => binary() | undefined
}

From the docs:

The general form of map types is #{AssociationList}. The key types in AssociationList are allowed to overlap, and if they do, the leftmost association takes precedence.

This can result in Dialyzer warnings such as:

    #{ maybe_bin := MaybeBin } = M, % M is of type m(), defined above
    case MaybeBin of
        % Warning:
        %   The call
        %       example:func(#{'maybe_bin'=>'undefined'})
        %   disagrees with the spec
        %       (#{atom()=>binary(),'maybe_bin'=>binary() | 'undefined'}) -> 'a' | 'b'
        %   in the 1st argument
        undefined -> a;
        Bin when is_binary(Bin) -> b
    end.

Dialyzer strictly respects the docs here, but the result can be quite obscure issues, especially when the error message seems inconsistent, since map association ordering is quite niche knowledge. It's easy to look at the error message and conclude that Dialyzer is wrong since an association is given which appears in the spec, and hence it "should" be accepted (with the user not realising that the earlier association "wins").

Notably, I've seen this issue cause a lot of confusion in large projects, since I have observed that some codebases:

  • Use big map types
  • Often define maps of the form several singleton (often atom) keys, as well as one or more, broader "catch-all" entries with non-singleton key types:
-type big_config_map() :: #{
    a => string(),
    b => binary() | undefined,
    c => atom() | string(),
    % ...
    z => tuple(),
    atom() => binary(),
}

In these cases, it's easy to (unknowingly!) end up with the "catch-all" entries earlier in a map definition than an overlapping, but more specific entry, effectively shadowing it. This is especially true when the map keys are themselves type aliases, thereby obscuring which keys are at risk of shadowing others.

To Reproduce

-module(example).
-export([main/1]).

% N.B. the overlap in the map key types, but a non-empty symmetric difference in the values

-spec func_general_key_first(#{
    % More general key first
    atom() => binary(),
    maybe_bin => binary() | undefined
}) -> a | b.
func_general_key_first(M) ->
    #{ maybe_bin := MaybeBin } = M,
    case MaybeBin of
        % Warning:
        %   The call
        %       example:func(#{'maybe_bin'=>'undefined'})
        %   disagrees with the spec
        %       (#{atom()=>binary(),'maybe_bin'=>binary() | 'undefined'}) -> 'a' | 'b'
        %   in the 1st argument
        undefined -> a;
        Bin when is_binary(Bin) -> b
    end.

-spec func_specific_key_first(#{
    % More specific key first
    maybe_bin => binary() | undefined,
    atom() => binary()
}) -> a | b.
func_specific_key_first(M) ->
    #{ maybe_bin := MaybeBin } = M,
    case MaybeBin of
        % No warning in this case!
        undefined -> a;
        Bin when is_binary(Bin) -> b
    end.

-spec main(integer()) -> term().
main(N) ->
    case N of
        0 ->
            a = func_general_key_first(#{maybe_bin => undefined}),
            a = func_specific_key_first(#{maybe_bin => undefined});
        _ ->
            b = func_general_key_first(#{maybe_bin => <<"some bin"/utf8>>}),
            b = func_specific_key_first(#{maybe_bin => <<"some bin"/utf8>>})
    end.

Expected behavior

Possible mitigations / solutions:

  • An improved error message which make it clear that map ordering is important (else the error message seems contradictory)
  • Report a warning if a later association is shadowed by an earlier one
  • An ad-hoc rewriting to handle the "obvious" cases: When constructing map types from spec forms, order all associations with singleton keys before those with non-singleton keys (and otherwise preserve ordering), to relieve the user of the burden of reordering association lists where they follow the common pattern of N distinct singleton atoms keys, followed by a "catch-all" entry of arbitrary keys. (e.g. https://github.com/erlang/otp/commit/29a879604d3ad16869184fbeab6330938f6e2e2c). I suggest this despite it being a hack, since it resolves all instances I have seen of this point of confusion appearing in the the wild and shouldn't introduce new warnings, hence it should be entirely backwards-compatible.
  • A redefinition of the meaning of map type association lists, but in a more principled way to more closely align with my perception of users' expectations, where "more specific" keys take priority over "less specific" ones, and overlapping keys result in the supremum of their values being accepted, rather than just the value of the first association being accepted, and conflicts in singleton keys are an explicit warning (e.g. -type bad() :: #{a => b, a => c}. would generate a warning about a appearing twice). I am not sure how feasible this actually is.

Affected versions All supported OTP versions.

Additional context Relevant docs: https://www.erlang.org/doc/system/typespec.html#types-and-their-syntax

TD5 avatar May 07 '25 10:05 TD5

Maybe my eyes are a bit too fresh, but I don't think Dialyzer's rule or warning makes sense in this case. The rule seems arbitrary, as it has nothing to do with how key-value associations actually work in maps. From a success typing pov, the value type of maybe_bin can totally be binary()|undefined. A more intuitive and backward-compatible rule would be to take the union for the value types when keys overlap. I would propose the following change to Dialyzer's behavior and documentation:

The key types in AssociationList are allowed to overlap, and if they do, their types and their corresponding value types may be combined during analysis, resulting in less precise types.

So I'm leaning towards the third solution you proposed. I'll dig into Dialyzer's code later to see if that's feasible. My guess is that the documentation describes what Dialyzer already does, not what an intended behavior by any specific designer of Dialyzer.

lucioleKi avatar May 07 '25 14:05 lucioleKi

Your new definition seems much better. The relevant code for handling shadowing of map association list entries is here: https://github.com/erlang/otp/blob/master/lib/dialyzer/src/erl_types.erl#L4525. I suppose we just need to take the t_sup of the values where the t_inf of the keys is not none.

TD5 avatar May 07 '25 14:05 TD5