coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

``rm``: r4_gnu_test

Open AnirbanHalder654322 opened this issue 1 year ago • 40 comments

Added regex detection to various combination of ".." and "." followed by slashes .

AnirbanHalder654322 avatar Aug 07 '24 09:08 AnirbanHalder654322

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 07 '24 10:08 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 07 '24 12:08 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 07 '24 12:08 github-actions[bot]

Why not make it work on Windows as well?

just-an-engineer avatar Aug 07 '24 16:08 just-an-engineer

#6620

just-an-engineer avatar Aug 07 '24 16:08 just-an-engineer

needs some updates

Sorry for the lack of communication, had some assignment deadlines to fulfill. I will push out updates today.

AnirbanHalder654322 avatar Aug 14 '24 06:08 AnirbanHalder654322

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

github-actions[bot] avatar Aug 14 '24 06:08 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 14 '24 09:08 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 14 '24 09:08 github-actions[bot]

@sylvestre This should be ready for a review now

AnirbanHalder654322 avatar Aug 14 '24 12:08 AnirbanHalder654322

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 14 '24 13:08 github-actions[bot]

Is the utf-8 crate from Cargo.toml used? I could just be missing it. If not, can you remove it?

Also, for your rm.rs, lines 584 & 585, I imagine we only want to check for /. or /.. exactly. I might be wrong here, but I can do touch hi., making a file that ends in either . or .., in which case your code would currently classify it as the parent or current dir. Also also, on lines 586 and 587, wouldn't you want it to be format!("{}.", ...) and similar to detect that?

~~I do like how you remove trailing slashes, but would you be able to propagate the new cleaned_path var on line 334 in rm.rs to the rest of that function (handle_dir)? I assume that would be nice to use for the rest since you've already processed it a bit. Also, would something like str.replace('//', '/') (and the related slash for Windows) work instead? (I'm just throwing out stuff, so if I'm wrong, then feel free to say so). I assume that will collapse double slashes both at the start, end, and middle. I assume the performance would be similar, but it might be more concise. But it looks like you just take a slice of the original string, so I assume yours doesn't work on multiple slashes in the middle of a path.~~

Lastly, for your tests, I think you could save yourself 30ish lines and just use the conditional variable assignments for *nix/Windows (/ vs \\) and just use a bunch of format all over the place. Although maybe it may look too cluttered, so perhaps what you have here is better than that method. Idk, @sylvestre, any thoughts on that? One worry would be drift if someone updates a test but forgets to do the other, but they're smallish and close to each other.

EDIT: I realize your function is for trailing slashes, not necessarily double. You could always turn it into a more general cleaning function with the replace that I mentioned, and remove trailing, but I just misread it, my apologies. But instead of a loop at line 633, wouldn't you be able to use a built-in method for the path to get the first index and use that directly in the path slice?

just-an-engineer avatar Aug 14 '24 13:08 just-an-engineer

Is the utf-8 crate from Cargo.toml used? I could just be missing it. If not, can you remove it?

I believe my PR does remove it.

Also, for your rm.rs, lines 584 & 585, I imagine we only want to check for /. or /.. exactly. I might be wrong here, but I can do touch hi., making a file that ends in either . or .., in which case your code would currently classify it as the parent or current dir. Also also, on lines 586 and 587, wouldn't you want it to be format!("{}.", ...) and similar to detect that?

The helper functions get called only when the stat() call yields a directory. So your file example won't really trigger the code.

There is no point in doing format!("{}.") as ends_with('.') would cover the point of the condition anyway. And we would not replace the former with the latter since format!("{}.") wouldn't be true when you do rm -rf . or rm -rf ....

Lastly, for your tests, I think you could save yourself 30ish lines and just use the conditional variable assignments for *nix/Windows (/ vs \\) and just use a bunch of format all over the place. Although maybe it may look too cluttered, so perhaps what you have here is better than that method. Idk, @sylvestre, any thoughts on that? One worry would be drift if someone updates a test but forgets to do the other, but they're smallish and close to each other.

We usually just add new tests instead of changing old tests unless it asserts wrong behavior and therefore fail, the test would be caught in the CI if people forget to change anyway.

EDIT: I realize your function is for trailing slashes, not necessarily double. You could always turn it into a more general cleaning function with the replace that I mentioned, and remove trailing, but I just misread it, my apologies. But instead of a loop at line 633, wouldn't you be able to use a built-in method for the path to get the first index and use that directly in the path slice?

We only care about cleaning the end of the string from slashes to print to stderr. There is no point cleaning the middle, doing a stat() call on something like ..//../ will refer to the directory at ../../ anyway. We are only cleaning to be compatible with gnu.

Actually I was under the idea that converting windows utf-16 OsStr to rust str type might fail that's why the I was very hesitant to use strings.

AnirbanHalder654322 avatar Aug 14 '24 18:08 AnirbanHalder654322

I believe my PR does remove it.

My apologies, I just saw it an my mind went to add. But yeah, I see you're actually removing it

The helper functions get called only when the stat() call yields a directory. So your file example won't really trigger the code.

Yeeaah, I remember that now. However, on Linux, I can also make a directory that's named test. or test... Would you be able to add some tests to make sure a valid directory that ends with . or .. don't get improperly ignored? Because if it's the case that a real directory that ends with . or .. makes it more necessary to change how you check it. Because I agree if we didn't care about that, the ends_with function from your reply:

There is no point in doing format!("{}.") as ends_with('.') would cover the point of the condition anyway. And we would not replace the former with the latter since format!("{}.") wouldn't be true when you do rm -rf . or rm -rf ....

Would work just as well. But I think we need to go through the extra hassle of ensuring that's the 'entire' name of the program. There might be other ways to do that, like see if it ends with . and is of length 1 (outside of the rest of the path), but it may be simpler to do the format and keep most of the structure of what you have

We usually just add new tests instead of changing old tests unless it asserts wrong behavior and therefore fail, the test would be caught in the CI if people forget to change anyway.

Yeah, that's a good point. It may not cover the case of someone making a test stricter and forgetting the other one, but I think it's valid enough. And like I said, the test looks clean enough, doing a bunch of formats may make it look not so pretty.

Actually I was under the idea that converting windows utf-16 OsStr to rust str type might fail that's why the I was very hesitant to use strings.

I have no clue about that, so I'll just say that sounds good if so and leave it to you and Sylvestre.

just-an-engineer avatar Aug 14 '24 18:08 just-an-engineer

The helper functions get called only when the stat() call yields a directory. So your file example won't really trigger the code.

Yeeaah, I remember that now. However, on Linux, I can also make a directory that's named test. or test... Would you be able to add some tests to make sure a valid directory that ends with . or .. don't get improperly ignored? Because if it's the case that a real directory that ends with . or .. makes it more necessary to change how you check it. Because I agree if we didn't care about that, the ends_with function from your reply:

Good catch, i will push a fix for that.

AnirbanHalder654322 avatar Aug 14 '24 19:08 AnirbanHalder654322

It seems like there is some niche behavior with GNU which does seem like a bug.

rm -rf ...
rm: refusing to remove '.' or '..' directory: skipping '../..'
rm -rf ....
rm: refusing to remove '.' or '..' directory: skipping '../../..' 

It seems like gnu converts the path into ../.. , seems they are processing it as overlapping instances of .. instead of disjoint instances of .. . Basically turning the path into the parent of parent directory in the case of rm -rf ... , parsing it as .. and .. which the 2nd . is overlapping. And same for .... , it essentially referring to k = (n!) / (2!*( n-2)!) , kth parent directory above the cwd in directory tree.

Linux allows both files and directories to be named ... or .... or combinations of ...... barring .. and . which refers to current dir and parent dir respectively. It feels a bit wrong to arbitrarily err on valid file names, which can be trivially created.

// Start up calls 

getrandom("\xd5\x1d\xd5\x6f\xc8\xd3\xcd\xe7", 8, GRND_NONBLOCK) = 8
brk(NULL)                               = 0x5d157286c000
brk(0x5d157288d000)                     = 0x5d157288d000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=15751120, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 15751120, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7abe77800000
close(3)                                = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "../..", {st_mode=S_IFDIR|0775, st_size=4096, ...}, AT_SYMLINK_NOFOLLOW) = 0 
// Seems like the path is converted in userspace and then a syscall is made to ``../..`` for ``rm -rf ...`` 

// User locale related syscalls 

 
newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=613, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 613, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7abe78a82000
close(3)                                = 0
write(2, "rm: ", 4rm: )                     = 4
write(2, "refusing to remove '.' or '..' d"..., 58refusing to remove '.' or '..' directory: skipping '../..') = 58
write(2, "\n", 1
)                       = 1
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

@sylvestre Need some advice on how to handle this . Do we just do the same and read the path as overlapping instances of .. ?

AnirbanHalder654322 avatar Aug 16 '24 19:08 AnirbanHalder654322

What setup are you on? I just made files and directories with a bunch of '.'s, and couldn't replicate it. For GNU, I mean. I'm on Mint 20.1, x86

just-an-engineer avatar Aug 16 '24 20:08 just-an-engineer

What setup are you on? I just made files and directories with a bunch of '.'s, and couldn't replicate it. For GNU, I mean. I'm on Mint 20.1, x86

Have you checked the hidden files to see if they exist ? Doing std::fs::File::create should work. Try doing ls -a after creating the file.

AnirbanHalder654322 avatar Aug 16 '24 20:08 AnirbanHalder654322

This is what I ran

me@computer:/tmp/test$ ls -a
.  ..
me@computer:/tmp/test$ mkdir ...
me@computer:/tmp/test$ touch ....
me@computer:/tmp/test$ touch .../hi
me@computer:/tmp/test$ ls -a
.  ..  ...  ....
me@computer:/tmp/test$ ls -a ...
.  ..  hi
me@computer:/tmp/test$ rm -rf ...
me@computer:/tmp/test$ rm -rf ....
me@computer:/tmp/test$ ls -a
.  ..

How did you get

rm -rf ...
rm: refusing to remove '.' or '..' directory: skipping '../..'

?

EDIT: added ls after mkdir and touch to show the created directory and files existed. No change to outcomes

just-an-engineer avatar Aug 16 '24 20:08 just-an-engineer

This is what I ran

me@computer:/tmp/test$ ls -a
.  ..
me@computer:/tmp/test$ mkdir ...
me@computer:/tmp/test$ touch ....
me@computer:/tmp/test$ touch .../hi
me@computer:/tmp/test$ ls -a
.  ..  ...  ....
me@computer:/tmp/test$ ls -a ...
.  ..  hi
me@computer:/tmp/test$ rm -rf ...
me@computer:/tmp/test$ rm -rf ....
me@computer:/tmp/test$ ls -a
.  ..

How did you get

rm -rf ...
rm: refusing to remove '.' or '..' directory: skipping '../..'

?

Interesting, I am on pop os 22.04 , and coreutils 9.5 . I simply did rm -rf ... inside a test directory. Touch also errors on ... for me and so does mkdir .... I reckon it would be the same on ubuntu machines as well.

AnirbanHalder654322 avatar Aug 16 '24 20:08 AnirbanHalder654322

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 22 '24 12:08 github-actions[bot]

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 22 '24 12:08 github-actions[bot]

@sylvestre It seems like the .... wrongful parsing is limited to my machine only. I have chosen to not include checking for it in this PR. I believe this is ready now to be reviewed .

AnirbanHalder654322 avatar Aug 22 '24 13:08 AnirbanHalder654322

I'm trying to update my coreutils right now, I'll give you an answer in an hour or two. My two-cents would be to keep in the correct behavior. I don't think we should avoid deleting "MALICIOUS_FILE.." just because it has a '..' at the end. Sure, it wouldn't be the end of the world, but I don't think we should limit the users. I think we should report it as a bug to GNU, and wait for them to correct it on their side (or help push a PR), but keep the correct behavior on uutil's side in the meantime.

just-an-engineer avatar Aug 22 '24 13:08 just-an-engineer

Yeah, so here's me running it with the latest release they have on their mirror (I figured to do it with that instead of the current master branch because I imagine the latest packages will use that, not the repo branch).

me@computer:~/gnu/coreutils$ mkdir /tmp/tests
me@computer:~/gnu/coreutils$ mkdir /tmp/tests/...
me@computer:~/gnu/coreutils$ touch /tmp/tests/....
me@computer:~/gnu/coreutils$ touch /tmp/tests/.../hi
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/
.  ..  ...  ....
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/...
.  ..  hi
me@computer:~/gnu/coreutils$ ./src/rm -rf /tmp/tests/...
me@computer:~/gnu/coreutils$ ./src/rm -rf /tmp/tests/....
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/
.  ..
me@computer:~/gnu/coreutils$ ./src/rm --version
rm (GNU coreutils) 9.4.114-11f3b3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Richard M. Stallman,
and Jim Meyering.

Unless it's been just introduced in 9.5, which it could have been. I wasn't able to get the latest working, anyways. But in the meantime, I'd ask other people what behavior they're seeing and update this thread, but outside of that, I think we should go with the behavior that I'm seeing, since it seems to be the behavior on all versions I've seen (at least on my machine, which is always the catch). Plus, I think it's just more correct, and even if the other behavior was actual GNU behavior, I'd still think we should go the route of my previous comment

just-an-engineer avatar Aug 22 '24 14:08 just-an-engineer

Yeah, so here's me running it with the latest release they have on their mirror (I figured to do it with that instead of the current master branch because I imagine the latest packages will use that, not the repo branch).

me@computer:~/gnu/coreutils$ mkdir /tmp/tests
me@computer:~/gnu/coreutils$ mkdir /tmp/tests/...
me@computer:~/gnu/coreutils$ touch /tmp/tests/....
me@computer:~/gnu/coreutils$ touch /tmp/tests/.../hi
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/
.  ..  ...  ....
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/...
.  ..  hi
me@computer:~/gnu/coreutils$ ./src/rm -rf /tmp/tests/...
me@computer:~/gnu/coreutils$ ./src/rm -rf /tmp/tests/....
me@computer:~/gnu/coreutils$ ls -a /tmp/tests/
.  ..
me@computer:~/gnu/coreutils$ ./src/rm --version
rm (GNU coreutils) 9.4.114-11f3b3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Richard M. Stallman,
and Jim Meyering.

Unless it's been just introduced in 9.5, which it could have been. I wasn't able to get the latest working, anyways. But in the meantime, I'd ask other people what behavior they're seeing and update this thread, but outside of that, I think we should go with the behavior that I'm seeing, since it seems to be the behavior on all versions I've seen (at least on my machine, which is always the catch). Plus, I think it's just more correct, and even if the other behavior was actual GNU behavior, I'd still think we should go the route of my previous comment

Does it work the same when basically do

mkdir tmp/tests
cd tmp/tests
rm -rf ...

or 

mkdir tmp/tests
cd tmp/tests
touch ...
rm -rf ...

Basically if i cd into tmp/tests directory , touch doesn't create a file named ... and rm -rf ... fails with rm: refusing to remove '.' or '..' directory: skipping '../..' . Even doing ls -a ... does the same thing as ls -a /../.. , goes to show the files in root directory of tmp

I am running 9.5 coreutils so i don't know if its specific to that. But i have seen different results on minor version changes on other utils programs.

AnirbanHalder654322 avatar Aug 22 '24 15:08 AnirbanHalder654322

Yeah, I just ran it, making /tmp/tests/ my current directory, and referenced the rm binary by path, and got the same behavior

just-an-engineer avatar Aug 22 '24 15:08 just-an-engineer

Yeah, I just ran it, making /tmp/tests/ my current directory, and referenced the rm binary by path, and got the same behavior

Well, i also installed coreutils 9.4 to check, it gives the same error. I will chalk it up to something on my machine which i can't figure out. A person on the discord has also confirmed few days ago they weren't facing any issues. So its very likely something on my machine which i cannot figure out.

AnirbanHalder654322 avatar Aug 22 '24 15:08 AnirbanHalder654322

GNU testsuite comparison:

Congrats! The gnu test tests/rm/r-4 is no longer failing!

github-actions[bot] avatar Aug 22 '24 16:08 github-actions[bot]

Hey, is this still something you're working on? If so, check out my PR (https://github.com/uutils/coreutils/pull/6623), and try using std::path::MAIN_SEPARATOR that Ben showed me. If not, I'll start updating my PR

just-an-engineer avatar Sep 07 '24 19:09 just-an-engineer