perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

BBC: Blead Breaks File::Tabular

Open cjg-cguevara opened this issue 1 year ago • 10 comments

This is a bug report for perl from "Carlos Guevara" [email protected], generated with the help of perlbug 1.43 running under perl 5.41.3.


BBC: Blead Breaks File::Tabular

Please see http://fast-matrix.cpantesters.org/?dist=File::Tabular


Flags

  • category=core
  • severity=low

Perl configuration

Site configuration information for perl 5.41.3:

Configured by cpan at Wed Aug  7 22:42:44 EDT 2024.

Summary of my perl5 (revision 5 version 41 subversion 3) configuration:
  Commit id: e2b4c6357925d2269403aff2d563ed245a1213ab
  Platform:
    osname=linux
    osvers=5.14.0-427.28.1.el9_4.x86_64
    archname=x86_64-linux
    uname='linux cjg-rhel9 5.14.0-427.28.1.el9_4.x86_64 #1 smp preempt_dynamic fri jul 19 14:40:47 edt 2024 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Dprefix=/home/cpan/bin/perl -Dscriptdir=/home/cpan/bin/perl/bin -Dusedevel -Duse64bitall'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
  Compiler:
    cc='cc'
    ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_FORTIFY_SOURCE=2'
    optimize='-O2'
    cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='11.4.1 20231218 (Red Hat 11.4.1-3)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /usr/lib64 /usr/local/lib64
    libs=-lpthread -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -ldl -lm -lcrypt -lutil -lc
    libc=/lib/../lib64/libc.so.6
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.34'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'


---
@INC for perl 5.41.3:
    /home/cpan/bin/perl/lib/site_perl/5.41.3/x86_64-linux
    /home/cpan/bin/perl/lib/site_perl/5.41.3
    /home/cpan/bin/perl/lib/5.41.3/x86_64-linux
    /home/cpan/bin/perl/lib/5.41.3

---
Environment for perl 5.41.3:
    HOME=/home/cpan
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_ALL=C
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/cpan/bin/perl/bin:/home/cpan/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/bash

cjg-cguevara avatar Aug 08 '24 03:08 cjg-cguevara

Probably 471c834070f1d22c4f2dc1c9997a5adfd959f3e5

tonycoz avatar Aug 08 '24 03:08 tonycoz

Probably 471c834

Confirmed through bisection.

$ gitshowf 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
commit 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
Author:     Lukas Mai <[email protected]>
AuthorDate: Sun Aug 4 19:40:32 2024 +0200
Commit:     Lukas Mai <[email protected]>
CommitDate: Tue Aug 6 00:59:48 2024 +0200

    open: only treat literal undef as special
...

@mauke, can you take a look? Thanks.

jkeenan avatar Aug 08 '24 11:08 jkeenan

I've opened a bug ticket with two alternative suggestions for fixing the code: https://rt.cpan.org/Ticket/Display.html?id=154767

IMHO the test code in File::Tabular relies on a bug in perl (#22385). It calls open($_[0], $_[1], $_[2]) and expects open to create a temporary file, even though the filename argument is manifestly a variable ($_[2]) and not a literal undef. But because $_[2] (in sub _open) happens to be an alias for $_[1] (in sub new), which in turn is an alias for undef in the constructor call in the test file, this used to trick perl into creating a temp file anyway.

mauke avatar Aug 08 '24 19:08 mauke

This problem will impact users of autodie as well.

I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me. A warning could be issued if a mode other than +> was used.

haarg avatar Aug 09 '24 06:08 haarg

This problem will impact users of autodie as well.

Could you provide an example of the problem autodie users will encounter?

jkeenan avatar Aug 09 '24 12:08 jkeenan

I'm not sure I really like the change to attempt to fix #22385. Requiring a literal undef for this kind of behavior doesn't feel like how things in perl should work. Having open use a temporary file when given any undef seems more sane to me.

It would be more consistent, but it would also fail in annoying (and silent) ways. For example, let's say you extract the name of a file to be processed from your configuration and try to open it:

my $filename = $config->{input_file};
open my $fh, "<", $filename
    or die "Can't open $filename: $!";

If open were to treat any undef as a temp file indicator and $config->{input_file} is not set (for any reason), this would silently open an empty file (instead of failing loudly). For my taste, "any undef means use a temp file instead" is just a bit too much magic given how easy it is to inadvertently end up with undef in a variable or expression.

The current behavior of released perls (before my changes in #22385) catches the case above reliably, but is otherwise hard to characterize. The code simply checks if (sv == &PL_sv_undef), which is implementation details of the interpreter leaking through. For example, if we take the example above and remove the $filename variable, …

open my $fh, "<", $config->{input_file}
    or die "Can't open $config->{input_file}: $!";

… the behavior now depends on whether the input_file key exists in %$config or not. If it doesn't exist, perl will open a temporary file. If it exists but is undef, perl will fail (as before). The behavior of any expression in the third argument of open depends on whether it returns any undef value in general or the canonical undef value in particular. (I don't think this difference is even documented anywhere; we only have "returns undef on error" or similar.)

So after rejecting the status quo (inconsistent, hard to describe without resorting to implementation details) and a simple defined() check at runtime (consistent, but even more error prone and potentially incompatible with some existing code), I decided to turn back all the way to static compile-time checks. There is precedent for this sort of thing in Perl:

  • Built-in functions in general get to have whatever syntax they want. Some of it can be mimicked with prototypes, other parts cannot.
  • system and exec switch to different behavior if given a bare block as the first ("indirect object") argument.
  • <...> decides whether to be glob() or readline() based purely on syntax and in a way no other operator does. For example, <foo> and <$foo> mean readline(foo) and readline($foo), respectively, but <${foo}> and <$ foo> both mean glob($foo).
  • eof (no argument) and eof() (empty parens) do different things.
  • if (1 .. 10) does something different from my ($x, $y) = (1, 10); if ($x .. $y).

And so now open is syntactic, too: "If the third argument is a literal undef, you get a temporary file" is a simple rule that matches what perldoc -f open describes. The check is tighter than before, so there are fewer chances (hopefully none) for you to get a temporary file when you didn't expect it (which is what #22385 is about). It is technically a change in the behavior of open, but only in cases that (depending on how you read perldoc -f open) had undefined behavior or had behavior that contradicted the documentation.

mauke avatar Aug 10 '24 01:08 mauke

Opening a temp file with a < mode makes no sense, so for the example you gave, a warning could easily be added.

Various core functions do have special parsing, but we generally consider those things warts to avoid in modern designs. And some of them are clearly unique by their syntax alone. system and exec with a block are obviously not standard function calls. <...> is also obviously unique syntax. open looks like a normal function call, so having not behave like one is confusing.

There is also precedence for odd compile time formulations being removed. Prior to perl 5.18, splitting on whitespace using split ' ' required the argument to be a literal string. perl5180delta

haarg avatar Aug 10 '24 02:08 haarg

This problem will impact users of autodie as well.

Could you provide an example of the problem autodie users will encounter?

Until this change, you could write:

use autodie;
open my $fh, '+>', undef;

This would open a file handle to a temp file. Since autodie overrides the open function, the magic added in #22465 won't apply. autodie is implemented in perl using normal functions, and its wrapper calls open($_[0], $_[1], @_[2 .. $#_]). The only way it could reimplement the new behavior of perl's open is by plugging in to the parser.

haarg avatar Aug 10 '24 02:08 haarg

Hi there, I'm joining this discussion because I'm the author of File::Tabular and became aware of the issue through the ticket opened by MAUKE. At first I found very surprising that a raw undef is not the same thing as an undefined variable, and thought it would be better to generalize the behaviour. But after reading the whole discussion in https://github.com/Perl/perl5/issues/22385 I'm now convinced by MAUKE's argument that a call to open() with an undefined variable is not the same intention than a raw undef and therefore should complain and fail.

Regarding File::Tabular, it's really not an issue if the behavior changes in 5.40 : the tempfile feature of open() was just convenient for tests, it's not essential to the module. However, if I understand correctly, replacing line 5 in

sub _open { # CORE::open is prototyped, so it cannot take args from an array. This is a workaround
  my $self   = shift;
  my $result = (ref $_[1] eq 'GLOB') ? $_[0] = $_[1]                         :
               @_ > 3                ? open($_[0], $_[1], $_[2], @_[3..$#_]) :
               @_ > 2                ? open($_[0], $_[1], $_[2])             :
                                       open($_[0], $_[1]);

by

               @_ > 2                ? open($_[0], $_[1], $_[2] // undef)             :

would solve the issue, correct ? Maybe a similar approach could be done for autodie and Fatal.pm.

Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.

damil avatar Aug 10 '24 21:08 damil

               @_ > 2                ? open($_[0], $_[1], $_[2] // undef)             :

No, this will not work. The third parameter is an expression, not a literal undef, so it won't trigger the temp file behavior. And even if it did, it would change how the parameters are handled vs the core open, which isn't how autodie is meant to work.

Besides, this magic opening of tempfiles through an undef arg was probably not a very good idea of perl 5.8 : in client code it would be more readable to make explicit calls to File::Temp. So maybe a deprecation policy for this feature could be envisioned, instead of trying to preserve all idiosyncrasies.

I agree that this isn't a great interface for doing this.

haarg avatar Aug 11 '24 04:08 haarg

File-Tabular is now passing on CPANtesters, @damil having released a new CPAN version on August 11. That would make this ticket closable, but there was a lot of discussion about larger issues between @mauke and @haarg. How should we proceed?

jkeenan avatar Sep 17 '24 12:09 jkeenan

The main focus of this ticket, File::Tabular, is working again.

Any further discussion of the semantics of open probably should not happen in this ticket (I believe it is being discussed by the PSC).

mauke avatar Sep 27 '24 19:09 mauke

I’ll put my thoughts down here for now since I’m not aware of this being tracked somewhere else yet.

@haarg wrote:

Opening a temp file with a < mode makes no sense, so for the example you gave, a warning could easily be added.

Arguably it’s a bug that open $fh, '<', undef even succeeds. With an actual filename, both < and +< fail if you try them on a path that doesn’t exist, and yet asking for a tempfile causes one to be created in spite of the read mode. Logically only > and +> ought to even succeed. And of those, > is of course useless.

OTOH I do see @mauke’s point. Even if you error on the read modes and warn on bare write, and only accept +> quietly, it’s still a problem that this can cause broken code to quietly successfully do a useless thing.

The thing is I don’t think it’s possible to make that sane. It seems to me that what’s broken is the design rather than the implementation.

So my instinct would be to

  1. add API to File::Temp for anonymous temp files (which, to my surprise, it doesn’t currently have (it does have UNLINK, but that’s something else entirely)) so we have an equivalently easy alternative for people to move to
  2. revert 471c834070f1d22c4f2dc1c9997a5adfd959f3e5
  3. make < and +< fail like they would with a defined value, and make > warn, as suggested, to help sanitize existing code
  4. add a no tempopen pragma that removes the anonymous temporary file behaviour from open (with docs pointing to the new File::Temp API), slated for inclusion in a feature bundle at some point

Now this approach (same as 471c834070f1d22c4f2dc1c9997a5adfd959f3e5 itself!) does remove the ability of users of an existing interface which internally calls open to tell that interface that they want it to use an anonymous file. If we really do want to retain that possibility, the design could be made sane – by using a different sentinel value than undef. It would have to be something clearly distinct from a regular scalar, like a reference to a special value (similar to what the JSON modules used for booleans when that was necessary) or a reference blessed into a special package or a scalar with special magic attached to it by some function (e.g. we’d have something like a builtin::tempfile constant that would return a recognizably magic’d scalar that you could pass as the path argument to open). I don’t dislike this but I also don’t think it’s worth the bother and the unusual API shape. ā€œUse File::Temp::anonfile if you need thatā€ is much simpler to learn and remember, and that obviousness seems like a better trade than the ability to manipulate a callee’s open call in that specific way. For people who really do need to subvert a callee’s open call (in this way as well as many other unrelated ways), they can pull out the CORE::GLOBAL guns to do that.

ap avatar Sep 28 '24 19:09 ap