BBC: Blead Breaks File::Tabular
This is a bug report for perl from "Carlos Guevara" [email protected], generated with the help of perlbug 1.43 running under perl 5.41.3.
BBC: Blead Breaks File::Tabular
Please see http://fast-matrix.cpantesters.org/?dist=File::Tabular
Flags
- category=core
- severity=low
Perl configuration
Site configuration information for perl 5.41.3:
Configured by cpan at Wed Aug 7 22:42:44 EDT 2024.
Summary of my perl5 (revision 5 version 41 subversion 3) configuration:
Commit id: e2b4c6357925d2269403aff2d563ed245a1213ab
Platform:
osname=linux
osvers=5.14.0-427.28.1.el9_4.x86_64
archname=x86_64-linux
uname='linux cjg-rhel9 5.14.0-427.28.1.el9_4.x86_64 #1 smp preempt_dynamic fri jul 19 14:40:47 edt 2024 x86_64 x86_64 x86_64 gnulinux '
config_args='-des -Dprefix=/home/cpan/bin/perl -Dscriptdir=/home/cpan/bin/perl/bin -Dusedevel -Duse64bitall'
hint=recommended
useposix=true
d_sigaction=define
useithreads=undef
usemultiplicity=undef
use64bitint=define
use64bitall=define
uselongdouble=undef
usemymalloc=n
default_inc_excludes_dot=define
Compiler:
cc='cc'
ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
optimize='-O2'
cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
ccversion=''
gccversion='11.4.1 20231218 (Red Hat 11.4.1-3)'
gccosandvers=''
intsize=4
longsize=8
ptrsize=8
doublesize=8
byteorder=12345678
doublekind=3
d_longlong=define
longlongsize=8
d_longdbl=define
longdblsize=16
longdblkind=3
ivtype='long'
ivsize=8
nvtype='double'
nvsize=8
Off_t='off_t'
lseeksize=8
alignbytes=8
prototype=define
Linker and Libraries:
ld='cc'
ldflags =' -fstack-protector-strong -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /usr/lib64 /usr/local/lib64
libs=-lpthread -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
libc=/lib/../lib64/libc.so.6
so=so
useshrplib=false
libperl=libperl.a
gnulibc_version='2.34'
Dynamic Linking:
dlsrc=dl_dlopen.xs
dlext=so
d_dlsymun=undef
ccdlflags='-Wl,-E'
cccdlflags='-fPIC'
lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
---
@INC for perl 5.41.3:
/home/cpan/bin/perl/lib/site_perl/5.41.3/x86_64-linux
/home/cpan/bin/perl/lib/site_perl/5.41.3
/home/cpan/bin/perl/lib/5.41.3/x86_64-linux
/home/cpan/bin/perl/lib/5.41.3
---
Environment for perl 5.41.3:
HOME=/home/cpan
LANG=en_US.UTF-8
LANGUAGE (unset)
LC_ALL=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/cpan/bin/perl/bin:/home/cpan/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
PERL_BADLANG (unset)
SHELL=/bin/bash
Probably 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
Probably 471c834
Confirmed through bisection.
$ gitshowf 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
commit 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
Author: Lukas Mai <[email protected]>
AuthorDate: Sun Aug 4 19:40:32 2024 +0200
Commit: Lukas Mai <[email protected]>
CommitDate: Tue Aug 6 00:59:48 2024 +0200
open: only treat literal undef as special
...
@mauke, can you take a look? Thanks.
I've opened a bug ticket with two alternative suggestions for fixing the code: https://rt.cpan.org/Ticket/Display.html?id=154767
IMHO the test code in File::Tabular relies on a bug in perl (#22385). It calls open($_[0], $_[1], $_[2]) and expects open to create a temporary file, even though the filename argument is manifestly a variable ($_[2]) and not a literal undef. But because $_[2] (in sub _open) happens to be an alias for $_[1] (in sub new), which in turn is an alias for undef in the constructor call in the test file, this used to trick perl into creating a temp file anyway.
This problem will impact users of autodie as well.
I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me. A warning could be issued if a mode other than +> was used.
This problem will impact users of autodie as well.
Could you provide an example of the problem autodie users will encounter?
I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me.
It would be more consistent, but it would also fail in annoying (and silent) ways. For example, let's say you extract the name of a file to be processed from your configuration and try to open it:
my $filename = $config->{input_file};
open my $fh, "<", $filename
or die "Can't open $filename: $!";
If open were to treat any undef as a temp file indicator and $config->{input_file} is not set (for any reason), this would silently open an empty file (instead of failing loudly). For my taste, "any undef means use a temp file instead" is just a bit too much magic given how easy it is to inadvertently end up with undef in a variable or expression.
The current behavior of released perls (before my changes in #22385) catches the case above reliably, but is otherwise hard to characterize. The code simply checks if (sv == &PL_sv_undef), which is implementation details of the interpreter leaking through. For example, if we take the example above and remove the $filename variable, ā¦
open my $fh, "<", $config->{input_file}
or die "Can't open $config->{input_file}: $!";
⦠the behavior now depends on whether the input_file key exists in %$config or not. If it doesn't exist, perl will open a temporary file. If it exists but is undef, perl will fail (as before). The behavior of any expression in the third argument of open depends on whether it returns any undef value in general or the canonical undef value in particular. (I don't think this difference is even documented anywhere; we only have "returns undef on error" or similar.)
So after rejecting the status quo (inconsistent, hard to describe without resorting to implementation details) and a simple defined() check at runtime (consistent, but even more error prone and potentially incompatible with some existing code), I decided to turn back all the way to static compile-time checks. There is precedent for this sort of thing in Perl:
- Built-in functions in general get to have whatever syntax they want. Some of it can be mimicked with prototypes, other parts cannot.
systemandexecswitch to different behavior if given a bare block as the first ("indirect object") argument.<...>decides whether to beglob()orreadline()based purely on syntax and in a way no other operator does. For example,<foo>and<$foo>meanreadline(foo)andreadline($foo), respectively, but<${foo}>and<$ foo>both meanglob($foo).eof(no argument) andeof()(empty parens) do different things.if (1 .. 10)does something different frommy ($x, $y) = (1, 10); if ($x .. $y).
And so now open is syntactic, too: "If the third argument is a literal undef, you get a temporary file" is a simple rule that matches what perldoc -f open describes. The check is tighter than before, so there are fewer chances (hopefully none) for you to get a temporary file when you didn't expect it (which is what #22385 is about). It is technically a change in the behavior of open, but only in cases that (depending on how you read perldoc -f open) had undefined behavior or had behavior that contradicted the documentation.
Opening a temp file with a < mode makes no sense, so for the example you gave, a warning could easily be added.
Various core functions do have special parsing, but we generally consider those things warts to avoid in modern designs. And some of them are clearly unique by their syntax alone. system and exec with a block are obviously not standard function calls. <...> is also obviously unique syntax. open looks like a normal function call, so having not behave like one is confusing.
There is also precedence for odd compile time formulations being removed. Prior to perl 5.18, splitting on whitespace using split ' ' required the argument to be a literal string. perl5180delta
This problem will impact users of autodie as well.
Could you provide an example of the problem
autodieusers will encounter?
Until this change, you could write:
use autodie;
open my $fh, '+>', undef;
This would open a file handle to a temp file. Since autodie overrides the open function, the magic added in #22465 won't apply. autodie is implemented in perl using normal functions, and its wrapper calls open($_[0], $_[1], @_[2 .. $#_]). The only way it could reimplement the new behavior of perl's open is by plugging in to the parser.
Hi there, I'm joining this discussion because I'm the author of File::Tabular and became aware of the issue through the ticket opened by MAUKE. At first I found very surprising that a raw undef is not the same thing as an undefined variable, and thought it would be better to generalize the behaviour. But after reading the whole discussion in https://github.com/Perl/perl5/issues/22385 I'm now convinced by MAUKE's argument that a call to open() with an undefined variable is not the same intention than a raw undef and therefore should complain and fail.
Regarding File::Tabular, it's really not an issue if the behavior changes in 5.40 : the tempfile feature of open() was just convenient for tests, it's not essential to the module. However, if I understand correctly, replacing line 5 in
sub _open { # CORE::open is prototyped, so it cannot take args from an array. This is a workaround
my $self = shift;
my $result = (ref $_[1] eq 'GLOB') ? $_[0] = $_[1] :
@_ > 3 ? open($_[0], $_[1], $_[2], @_[3..$#_]) :
@_ > 2 ? open($_[0], $_[1], $_[2]) :
open($_[0], $_[1]);
by
@_ > 2 ? open($_[0], $_[1], $_[2] // undef) :
would solve the issue, correct ? Maybe a similar approach could be done for autodie and Fatal.pm.
Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.
@_ > 2 ? open($_[0], $_[1], $_[2] // undef) :
No, this will not work. The third parameter is an expression, not a literal undef, so it won't trigger the temp file behavior. And even if it did, it would change how the parameters are handled vs the core open, which isn't how autodie is meant to work.
Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.
I agree that this isn't a great interface for doing this.
File-Tabular is now passing on CPANtesters, @damil having released a new CPAN version on August 11. That would make this ticket closable, but there was a lot of discussion about larger issues between @mauke and @haarg. How should we proceed?
The main focus of this ticket, File::Tabular, is working again.
Any further discussion of the semantics of open probably should not happen in this ticket (I believe it is being discussed by the PSC).
Iāll put my thoughts down here for now since Iām not aware of this being tracked somewhere else yet.
@haarg wrote:
Opening a temp file with a
<mode makes no sense, so for the example you gave, a warning could easily be added.
Arguably itās a bug that open $fh, '<', undef even succeeds. With an actual filename, both < and +< fail if you try them on a path that doesnāt exist, and yet asking for a tempfile causes one to be created in spite of the read mode. Logically only > and +> ought to even succeed. And of those, > is of course useless.
OTOH I do see @maukeās point. Even if you error on the read modes and warn on bare write, and only accept +> quietly, itās still a problem that this can cause broken code to quietly successfully do a useless thing.
The thing is I donāt think itās possible to make that sane. It seems to me that whatās broken is the design rather than the implementation.
So my instinct would be to
- add API to File::Temp for anonymous temp files (which, to my surprise, it doesnāt currently have (it does have
UNLINK, but thatās something else entirely)) so we have an equivalently easy alternative for people to move to - revert 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
- make
<and+<fail like they would with a defined value, and make>warn, as suggested, to help sanitize existing code - add a
no tempopenpragma that removes the anonymous temporary file behaviour fromopen(with docs pointing to the new File::Temp API), slated for inclusion in a feature bundle at some point
Now this approach (same as 471c834070f1d22c4f2dc1c9997a5adfd959f3e5 itself!) does remove the ability of users of an existing interface which internally calls open to tell that interface that they want it to use an anonymous file. If we really do want to retain that possibility, the design could be made sane ā by using a different sentinel value than undef. It would have to be something clearly distinct from a regular scalar, like a reference to a special value (similar to what the JSON modules used for booleans when that was necessary) or a reference blessed into a special package or a scalar with special magic attached to it by some function (e.g. weād have something like a builtin::tempfile constant that would return a recognizably magicād scalar that you could pass as the path argument to open). I donāt dislike this but I also donāt think itās worth the bother and the unusual API shape. āUse File::Temp::anonfile if you need thatā is much simpler to learn and remember, and that obviousness seems like a better trade than the ability to manipulate a calleeās open call in that specific way. For people who really do need to subvert a calleeās open call (in this way as well as many other unrelated ways), they can pull out the CORE::GLOBAL guns to do that.