libc icon indicating copy to clipboard operation
libc copied to clipboard

Wrong structs dirent, dirent64

Open safinaskar opened this issue 3 years ago • 22 comments

Your structs dirent, dirent64 are wrong at x86_64-unknown-linux-gnu. Their field d_name contains 256 bytes. But names can be bigger than 256 bytes in NTFS (in Linux). I was able to create NTFS partition in my Linux and then create file with name "ыыыы..." (250 times "ы", this is Cyrillic letter). Such name contains more than 256 bytes (if you have difficulties with reproducing, I can provide more details).

So, fix structs. I am not sure how exactly fix should be implemented. Maybe [c_char; 512], but in this case we need to double check what is true upper limit. Maybe we can simply put [c_char; 1] or [c_char; 0]. Of course, it would be perfect to make dirent unsized, but with machine word sized reference. Unfortunately, as well as I know, such unsized structs are not possible in stable Rust

safinaskar avatar Feb 04 '22 19:02 safinaskar

I m sorry to say but in fact dirent* struct are in fact correct, I invite you to look at the man pages/headers for confirmation. Note that libc is not a "smart wrapper" neither but reflects what the system libc are. What seems more appropriate in your case is the getdents api usage.

devnexen avatar Feb 05 '22 06:02 devnexen

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

safinaskar avatar Feb 05 '22 11:02 safinaskar

getdents

getdents is not wrapped by libc crate, nor by glibc. So I will have to manually wrap it, this is error-prone

safinaskar avatar Feb 05 '22 11:02 safinaskar

you can call it via syscall (the syscalls ids are present in libc) and map the linux_dirent* I think.

devnexen avatar Feb 05 '22 12:02 devnexen

I still think that we should research whether accessing dirent past its formal size is undefined behavior (possibly by reading https://rust-lang.github.io/unsafe-code-guidelines ). If yes, then we should fix or remove readdir function from libc, because it is impossible to use it correctly

safinaskar avatar Feb 05 '22 15:02 safinaskar

or you can copy the result into a sized buffer. I disagree removing readdir honestly.

devnexen avatar Feb 05 '22 15:02 devnexen

or you can copy the result into a sized buffer

You mean read result from dirent to sized buffer? Okey, but is this reading undefined behavior?

safinaskar avatar Feb 05 '22 15:02 safinaskar

Note that unsafe really means what it means, inside this block, rust safety and guarantees do not apply, it is the responsibility of code user to handle carefully possible side effects, use after free, stack overflow ... just like with language like C.

If anyone wants to chime in the discussion feel free :-)

devnexen avatar Feb 05 '22 16:02 devnexen

@devnexen consider this code:

// This code for x86_64-unknown-linux-gnu
use std::ffi::CStr;
fn main() {
    unsafe {
        let dir = libc::opendir(b"/mnt/3\x00" as *const u8 as *const i8);
        assert_ne!(dir, std::ptr::null_mut());
        loop {
            let ent: *mut libc::dirent = libc::readdir(dir);
            if ent == std::ptr::null_mut() {
                break;
            }
            let name = CStr::from_ptr((*ent).d_name.as_ptr());
            println!("{}", name.to_bytes().len());
            println!("{:?}", name);
        }
    }
}

This code correctly handles file names bigger than 256 bytes. Does it invoke undefined behavior? I. e. is reading past formal struct size undefined behavior?

I don't have all the context to answer this question definitively, but assuming that readdir actually returned a contiguous allocation with a null-terminated d_name field of longer than 256 bytes, your code as written is suspicious and possibly UB (whether it is actually UB is rust-lang/unsafe-code-guidelines#134) due to creating a reference at (*ent).d_name.as_ptr(), but you can rewrite it to be definitely not UB by using addr_of!((*ent).d_name) instead.

digama0 avatar Jun 04 '23 16:06 digama0

@digama0 , thanks a lot for answer. This is very big gotcha. This means lib::dirent should be fixed or at very least properly documented

safinaskar avatar Jun 05 '23 11:06 safinaskar

I just found that actual std implementation of ReadDir is even more careful. The authors reject addr_of!((*ent).d_name) as wrong and instead perform offset computations manually. See code starting from this line: https://github.com/rust-lang/rust/blob/42f28f9eb41adb7a197697e5e2d6535d00fd0f4a/library/std/src/sys/unix/fs.rs#L676

safinaskar avatar Jun 05 '23 13:06 safinaskar

@RalfJung Could you comment on the correctness of the comment there? My take is that the reason addr_of!((*ptr).d_name) is said to be invalid is because the allocation may not actually be 256 bytes (plus the rest of the struct), and (at least for now) addr_of!((*ptr).d_name) asserts that the entire d_name: [c_char; 256] field is within an allocation. So an alternative way to fix the code would be to have a version of dirent with a d_name: [c_char; 0] VLA and using that to do the offset computation.

digama0 avatar Jun 05 '23 14:06 digama0

Yeah that comment sounds accurate for the current very conservative semantics we have for *ptr (where the pointer has to be fully aligned and dereferenceable). That comment is also a good argument for why we'd probably want a more relaxed semantics...

RalfJung avatar Jun 05 '23 15:06 RalfJung

It sounds like we can improve things with a breaking change. If anyone has any proposals, feel free to put up a PR for 1.0.

tgross35 avatar Aug 29 '24 05:08 tgross35

@tgross35 , obviously, perfect solution would be waiting for "thin cstr" and making thin cstr (not reference to thin cstr, but unsized thin cstr itself) last member of dirent

safinaskar avatar Aug 29 '24 07:08 safinaskar

Yeah, I do wish we had that available. Since we don't though, changing to [c_char; 0] is probably our best bet.

tgross35 avatar Aug 29 '24 07:08 tgross35

This issue is about the name being longer than 256 bytes. I'd like to add that the name could even be shorter than 256 bytes! I have observed this in practice and this is corroborated by this comment in the std library:

                // The dirent64 struct is a weird imaginary thing that isn't ever supposed
                // to be worked with by value. Its trailing d_name field is declared
                // variously as [c_char; 256] or [c_char; 1] on different systems but
                // either way that size is meaningless; only the offset of d_name is
                // meaningful. The dirent64 pointers that libc returns from readdir64 are
                // allowed to point to allocations smaller _or_ LARGER than implied by the
                // definition of the struct.

I noticed this, because the following code was crashing:

unsafe { *self.entry }.d_type

where self.entry is libc::dirent64 or libc::dirent. At least when compiling in debugging mode, the unsafe block copies the whole structure, and then accesses d_type. The memcpy can result in a seg fault if the direct data structure is smaller, at the end of a page, and the following page is not mapped:

Thread 7 "store::certd::t" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 10589]
memcpy () at src/string/x86_64/memcpy.s:18
warning: 18     src/string/x86_64/memcpy.s: No such file or directory
(gdb) up
#1  0x000055f054f18d34 in openpgp_cert_d::unixdir::DirEntry::file_type (self=0x7fa822fdb780)
    at src/unixdir.rs:144
144             FileType(unsafe { *self.entry }.d_type)

Making d_name a [c_char; 0] (or even a [c_char; 1] since the name is NUL terminated) would prevent this issue, I think.

nwalfield avatar Sep 11 '24 12:09 nwalfield

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

RalfJung avatar Sep 11 '24 12:09 RalfJung

Yeah by wrapping *self.entry in curly braces you are forcing that value to be loaded and moved. I would recommend you use unsafe { *self.entry.d_type } to avoid such an unnecessary load + move.

Thanks a lot for the suggestion!

nwalfield avatar Sep 11 '24 12:09 nwalfield

For the record, unsafe { *self.entry.d_type } doesn't work, but this does:

        unsafe { (&*self.entry).d_type }

(Here's my complete change.)

nwalfield avatar Sep 13 '24 10:09 nwalfield

unsafe { (*self.entry).d_type } should also work.

RalfJung avatar Sep 13 '24 10:09 RalfJung

Indeed it does. Thanks!

nwalfield avatar Sep 13 '24 11:09 nwalfield