perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

The rules for autovivification are not consistent and lack a conceptual explanatory foundation

Open demerphq opened this issue 3 years ago • 0 comments

Description Autovivification is one of perls strengths, it is also one of perls weaknesses. Some would argue that autovivification is intrinsically evil, personally I disagree, I think it is one of the reasons that Perl code is often very elegant compared to the equivalent from languages which do not have it. However one problem with it is that it is very difficult to understand when it occurs. The understanding I have formed over the years is that it is essentially coupled with the concept of LVALUE and RVALUE. In general if you use a variable in LVALUE context perl will at some level autovivify in DWIM fashion for you, if you use it in an RVALUE context it will not be autovivified and undef will often throw an exception. I have also come to understand that in a chained dereference operation all of the variable lookups but the rightmost are essentially LVALUES as they will be written to. It also explains why arguments to subroutines, or the subjects of aliased loops structures are autovivified. They are all places that are implicitly LVALUE uses of the variables.

But this is an imperfect model. At some time in the past we added some logic which changes how elements of composite data structures are autovivified when they are passed as arguments to a subroutine, and we only autovivify if there is an actual write, creating a form of "lazy" autovivification. But in a for loop we autovivify regardless:

$ perl -MData::Dumper -E'my $foo; sub f {}; f($foo->{k}); say 0+keys %$foo;'
0
$ perl -MData::Dumper -E'my $foo; sub f {}; for my $x ($foo->{k}) {}; say 0+keys %$foo;'
1

We also have weird cases where the LVALUE/autovivification seems to be propagated further than it should

perl -MData::Dumper -E'my $foo; sub f {}; f(@$foo); say $foo;'
ARRAY(0x560e7b5cf5e8)

in the above case it is not clear why $foo should be autovivified here, especially as it would not be here:

perl -MData::Dumper -E'my $foo; my $n= @$foo; say $foo;'

and this throws an exception under strict:

perl -MData::Dumper -E'use strict; my $foo; my $n= @$foo; say $foo;'
Can't use an undefined value as an ARRAY reference at -e line 1.

The argument to values and keys also autovivify for no seemingly good reason (at the perl semantics level, at the C level I get it)

perl -MData::Dumper -E'use strict; my $foo; my $n= keys %$foo; say $foo;' HASH(0x55939e9545e8)

Aside from the lazy autovivification special case for foo($foo->{bar}), a common theme here seems to be that we propagate the LVALUE context in certain list contexts from the elements of the list of the thing that generates the list, which seems to be improper.

I personally think the lazy autovivification feature was a mistake, and we should either remove it, or make sure that it works that way consistently in all cases, such as loop topic vars.

Now, it may be that it is simply too late for fixing some of these things. I suspect a lot of scripts would break if we changed

for (@$foo) { }

so that we threw an exception if $foo was undef (which I believe we should, the @$foo operation should be in RVALUE context returning a lit whose elements would be in LVALUE context), but perhaps we need not die while at the same time NOT autovivifying $foo.

I wanted this ticket because I believe that we should review autovivification holistically. It seems to me that we have a collection of disparate and inconsistent behavior and no clear rationale underpinning it. The LVALUE justification is the bast I know of, and it doesn't explain all the details. The situation was more coherent before we introduced lazy-autovivification. If we dont have a coherent rationale then how can we decided what behavior is correct?

IMO having a clear rationale and set of rules by which developers can understand autovivification would really help reduce the hate on it. I bet that if we devised an autovivification exam that there would be few people contributing to perl regularly who would know all the rules and inconsistencies.

Even if we cant change some of these things, IMO it would be helpful if we could agree how they /should/ work. For instance if someone stepped up based on this ticket and made for ($foo->{bar}) work similarly to f($foo->{bar}) would we consider that a bug fix? Or would we consider it a regression? Should they be consistent? What happens if someone wants to make auto-vivification lazy in more contexts, would we consider that a bug fix? What if we ripped out lazy-autovivification? Would that be a regression? If we have no conceptual foundation for how and when autovivification then we cant really answer any of these questions.

I believe that the fact that autovivification is so difficult to explain and describe and is so full of inconsistencies that we can only consider some of what we do buggy, and that we should admit they are bugs so that we can start addressing them.

We have had many tickets over the years from being surprised by various forms of autovivification, maybe we should like those tickets to this one.

Steps to Reproduce Try any of the above one liners.

Expected behavior Consistency.

Perl configuration Not relevant.

demerphq avatar Jul 29 '22 08:07 demerphq