perl -n mode eats trailing spaces in filenames
Description
If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened. Ironically the openat() w/mangled name which gets ENOENT is fillowed by a newfstatat() with the correct name which returns 0 (success).
Steps to Reproduce
tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
find . -maxdepth 1 -name space\* -print0 | xargs -0 perl -ne 'print'
space middle
Can't open ./spaceend : No such file or directory at -e line 1, <> line 1.
Under strace, we can observe:
[pid 16528] openat(AT_FDCWD, "./space middle", O_RDONLY|O_CLOEXEC) = 3
### middle space preserved fine
[pid 16528] openat(AT_FDCWD, "./spaceend", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 16528] newfstatat(AT_FDCWD, "./spaceend ", {st_mode=S_IFREG|0664, st_size=4, ...}, 0) = 0
### open with the space stripped, followed by stat w/correct name
Attempting to doctor @ARGV in BEGIN by, say, \-escaping trailing spaces does not work, we get open() with a literal \ but no trailing space, and then a newfstatat() with both the \ and the space:
tmp $ echo 'space middle' >'space middle' ; echo 'spaceend ' >'spaceend ' ; \
find . -maxdepth 1 -name space\* -print0 | \
xargs -0 strace -f perl -ne 'BEGIN { @ARGV = ( map { s/( )$/\\ /g; $_ } @ARGV ) } print'
...
openat(AT_FDCWD, "./spaceend\\", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "./spaceend\\ ", 0x7ffcf93b1010, 0) = -1 ENOENT (No such file or directory)
Expected behavior
Filenames provided as arguments should be preserved; test-case output should be:
space middle
spaceend
Perl configuration
Site configuration information for perl 5.38.2:
Configured by Gentoo at Sun Mar 24 15:08:42 MDT 2024.
Summary of my perl5 (revision 5 version 38 subversion 2) configuration:
Platform:
osname=linux
osvers=6.6.9-gentoo
archname=x86_64-linux
uname='linux localhost 6.6.x'
[snip]
---
@INC for perl 5.38.2:
/etc/perl
/usr/local/lib64/perl5/5.38/x86_64-linux
/usr/local/lib64/perl5/5.38
/usr/lib64/perl5/vendor_perl/5.38/x86_64-linux
/usr/lib64/perl5/vendor_perl/5.38
[snip]
---
Environment for perl 5.38.2:
[snip]
LANG=en_US.utf8
SHELL=/bin/bash
On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:
If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.
This is documented behaviour.
-n and -p are documented (in perlrun) to do
while (<>) { ... }
'while (<>)' is documented (in perlop, "I/O Operators") to do
while ($ARGV = shift) {
open(ARGV, $ARGV);
while (<ARGV>) {
... # code for each line
}
}
and 2-arg open is documented (in perlfunc, "Whitespace and special characters in the filename argument") to strip leading and trailing whitespace.
It's not ideal behaviour, but its been documented that way for 30+ years.
I wonder whether we should add a command-line switch to make <> act like <<>> ?
-- I don't want to achieve immortality through my work... I want to achieve it through not dying. -- Woody Allen
On Sat, Mar 30, 2024 at 06:22:11PM -0700, hlein wrote:
If perl -n processes an argument of a filename containing a trailing space, the space will be eaten before the file is opened.
This is documented behaviour.
while ($ARGV = shift) { open(ARGV, $ARGV);and 2-arg open is documented (in perlfunc, "Whitespace and special> characters in the filename argument") to strip leading and trailing whitespace.
Aha! Yes, you are right. I've purged 2-argument open() from my own muscle memory years ago, was not thinking about that being the method -n uses and its implications for implied whitespace strip.
I wonder whether we should add a command-line switch to make <> act like <<>> ?
That or any kind of pragma that could alter the type of open performed by -n that could be called in BEGIN? (I'd be afraid of unintended consequences elsewhere in scripts, unless you meant only for -n's processing.) Maybe it's possible to hook the 2-arg open performed by -n and iff file not found and file ends in space and such a file exists (the subsequent newfstatat finds it, after all), have it retry a 3-argument open? Well, that's ugly.
Hm, it's slightly worse though. In a directory with foo and foo , doing find ... -print0 | xargs -0 perl -ne ... will end up silently processing foo twice and foo not at all?
Filenames that end in spaces is silly. Only reason I encountered this was writing some tools to iterate through arbitrary code trees / repositories and do some calculations... but some projects have test-case files that end in spaces on purpose, which tripped me up.
it might be nice to have -N and -P that do 3 arg opens without the trimming...
Given that the behavior questioned by the OP is documented behavior, I believe this ticket is closable. Further discussion about possible changes in Perl should take place on the mailing list. Self-assigning for the purpose of closing this ticket in 7 days unless there is a serious objection.
Given that the behavior questioned by the OP is documented behavior, I believe this ticket is closable. Further discussion about possible changes in Perl should take place on the mailing list. Self-assigning for the purpose of closing this ticket in 7 days unless there is a serious objection.
Yeah, I'm convinced it's documented behavior.
Buuut I'm also leaning to, it's an unsafe behavior. See above, this means that one can effectively hide files from processing by -n by creating files with the same name of an existing file plus a space (or possibly - cause a bystander file to be tampered with). I haven't looked closely into say, spamassassin, or perhaps a perl tmpwatch variant to make sure they never use -n over files an attacker can name; if they did, one could likely use this to smuggle malicious attachments through, etc. If it's unsafe in some foreseeable circumstances, it's probably an unwise habit to be in. /me sighs in almost 30 years of muscle memory...
So, I would love it if there were interest in some of the ideas mentioned above - variations on -n, or a pragma that switches to an unambiguous 3-argument open, or what have you. But other than a couple of drive-by commenters, doesn't seem to be much interest.
Ah, reading is hard, I just realized you said:
Further discussion about possible changes in Perl should take place on the mailing list.
Which list would be most appropriate? perl5-porters?
Thanks!
On 8/9/24 17:31, hlein wrote: [snip]
Ah, reading is hard, I just realized you said:
Further discussion about possible changes in Perl should take place on the mailing list.Which list would be most appropriate? perl5-porters?
Yes.