perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

perl -n mode eats trailing spaces in filenames

Open hlein opened this issue 2 years ago • 6 comments

Description If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened. Ironically the openat() w/mangled name which gets ENOENT is fillowed by a newfstatat() with the correct name which returns 0 (success).

Steps to Reproduce

tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
  find . -maxdepth 1 -name space\* -print0 | xargs -0 perl -ne 'print'
space middle
Can't open ./spaceend : No such file or directory at -e line 1, <> line 1.

Under strace, we can observe:

[pid 16528] openat(AT_FDCWD, "./space middle", O_RDONLY|O_CLOEXEC) = 3
### middle space preserved fine
[pid 16528] openat(AT_FDCWD, "./spaceend", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 16528] newfstatat(AT_FDCWD, "./spaceend ", {st_mode=S_IFREG|0664, st_size=4, ...}, 0) = 0
### open with the space stripped, followed by stat w/correct name

Attempting to doctor @ARGV in BEGIN by, say, \-escaping trailing spaces does not work, we get open() with a literal \ but no trailing space, and then a newfstatat() with both the \ and the space:

tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
  find . -maxdepth 1 -name space\* -print0 | \
  xargs -0 strace -f perl -ne 'BEGIN { @ARGV = ( map { s/( )$/\\ /g; $_ } @ARGV ) } print'
...
openat(AT_FDCWD, "./spaceend\\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "./spaceend\\ ", 0x7ffcf93b1010, 0) = -1 ENOENT (No such file or directory)

Expected behavior

Filenames provided as arguments should be preserved; test-case output should be:

space middle
spaceend 

Perl configuration

Site configuration information for perl 5.38.2:

Configured by Gentoo at Sun Mar 24 15:08:42 MDT 2024.

Summary of my perl5 (revision 5 version 38 subversion 2) configuration:
   
  Platform:
    osname=linux
    osvers=6.6.9-gentoo
    archname=x86_64-linux
    uname='linux localhost 6.6.x'
    [snip]

---
@INC for perl 5.38.2:
    /etc/perl
    /usr/local/lib64/perl5/5.38/x86_64-linux
    /usr/local/lib64/perl5/5.38
    /usr/lib64/perl5/vendor_perl/5.38/x86_64-linux
    /usr/lib64/perl5/vendor_perl/5.38
    [snip]

---
Environment for perl 5.38.2:
    [snip]
    LANG=en_US.utf8
    SHELL=/bin/bash

hlein avatar Mar 31 '24 01:03 hlein

On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:

If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.

This is documented behaviour.

-n and -p are documented (in perlrun) to do

while (<>) { ... }

'while (<>)' is documented (in perlop, "I/O Operators") to do

    while ($ARGV = shift) {
        open(ARGV, $ARGV);
        while (<ARGV>) {
            ...		# code for each line
        }
    }

and 2-arg open is documented (in perlfunc, "Whitespace and special characters in the filename argument") to strip leading and trailing whitespace.

It's not ideal behaviour, but its been documented that way for 30+ years.

I wonder whether we should add a command-line switch to make <> act like <<>> ?

-- I don't want to achieve immortality through my work... I want to achieve it through not dying. -- Woody Allen

iabyn avatar Mar 31 '24 11:03 iabyn

On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:

If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.

This is documented behaviour.

       while ($ARGV = shift) {
           open(ARGV, $ARGV);

and 2-arg open is documented (in perlfunc, "Whitespace and special> characters in the filename argument") to strip leading and trailing whitespace.

Aha! Yes, you are right. I've purged 2-argument open() from my own muscle memory years ago, was not thinking about that being the method -n uses and its implications for implied whitespace strip.

I wonder whether we should add a command-line switch to make <> act like <<>> ?

That or any kind of pragma that could alter the type of open performed by -n that could be called in BEGIN? (I'd be afraid of unintended consequences elsewhere in scripts, unless you meant only for -n's processing.) Maybe it's possible to hook the 2-arg open performed by -n and iff file not found and file ends in space and such a file exists (the subsequent newfstatat finds it, after all), have it retry a 3-argument open? Well, that's ugly.

Hm, it's slightly worse though. In a directory with foo and foo , doing find ... -print0 | xargs -0 perl -ne ... will end up silently processing foo twice and foo not at all?

Filenames that end in spaces is silly. Only reason I encountered this was writing some tools to iterate through arbitrary code trees / repositories and do some calculations... but some projects have test-case files that end in spaces on purpose, which tripped me up.

hlein avatar Mar 31 '24 17:03 hlein

it might be nice to have -N and -P that do 3 arg opens without the trimming...

guest20 avatar Apr 02 '24 10:04 guest20

Given that the behavior questioned by the OP is documented behavior, I believe this ticket is closable. Further discussion about possible changes in Perl should take place on the mailing list. Self-assigning for the purpose of closing this ticket in 7 days unless there is a serious objection.

jkeenan avatar Aug 09 '24 20:08 jkeenan

Given that the behavior questioned by the OP is documented behavior, I believe this ticket is closable. Further discussion about possible changes in Perl should take place on the mailing list. Self-assigning for the purpose of closing this ticket in 7 days unless there is a serious objection.

Yeah, I'm convinced it's documented behavior.

Buuut I'm also leaning to, it's an unsafe behavior. See above, this means that one can effectively hide files from processing by -n by creating files with the same name of an existing file plus a space (or possibly - cause a bystander file to be tampered with). I haven't looked closely into say, spamassassin, or perhaps a perl tmpwatch variant to make sure they never use -n over files an attacker can name; if they did, one could likely use this to smuggle malicious attachments through, etc. If it's unsafe in some foreseeable circumstances, it's probably an unwise habit to be in. /me sighs in almost 30 years of muscle memory...

So, I would love it if there were interest in some of the ideas mentioned above - variations on -n, or a pragma that switches to an unambiguous 3-argument open, or what have you. But other than a couple of drive-by commenters, doesn't seem to be much interest.

Ah, reading is hard, I just realized you said:

Further discussion about possible changes in Perl should take place on the mailing list.

Which list would be most appropriate? perl5-porters?

Thanks!

hlein avatar Aug 09 '24 21:08 hlein

On 8/9/24 17:31, hlein wrote: [snip]

Ah, reading is hard, I just realized you said:

Further discussion about possible changes in Perl should take place
on the mailing list.

Which list would be most appropriate? perl5-porters?

Yes.

jkeenan avatar Aug 10 '24 00:08 jkeenan