Make it easier to debug path parsing failures
I have a non-trivial bit of code using rustix that is returning EINVAL. I thought it was an OS error, but it turns out that I was getting https://github.com/rust-lang/rust/issues/116523. However it was being masked by the fact that Arg impl returns EINVAL if there is a \0 in the path bytes. It would be nice if rustix returned an error code less plausible than EINVAL when a path failure like this happens.
Do you have any suggestions for which code to use? I can see how EINVAL is confusing like this, but I imagine any error code we could return could also be confusing. Even if we picked something as silly as ENOANO, if I were debugging that I think I might guess that's coming from the kernel just as well.
For what it''s worth, Errno::INVAL corresponds to std::io::ErrorKind::InvalidInput in Rust, which is what the standard library uses if you pass a filesystem function a string with embedded NULs. That said, the standard library puts it in an Error which also contains a string message explaining what happened.
So, another option would be to have functions that take path arguments use a different error type, which could be an enum containing either an Errno for an OS error or some form of "invalid path" error. That'd be a semver bump, but we could consider it next time we do a semver bump.
I think that this could be solved by taking advantage of the fact that there are many unused error codes.
- Error codes are represented on
linux_rawas au16but only ever take up the-4096..0range. Which leaves the rest of theu16space open. - Error codes are represented on
libcas ac_int. Buterrnocan only ever be positive, which leaves an additional1 << 31error codes to be used here.
Specifically, in both cases, there is a surefire way to tell if something is an actual system error code:
- For
linux_raw, the highest bit will always be set to1. So we can match on this bit to tell if it is a system error code or if it's something we've created. This leaves the remaining 15 bits of space to do what we want with. - For
libc, the highest bit will always be0. So we can match on this bit in a similar way, giving us 31 bits of space left over.
For the conversions in arg.rs, these errors can occur:
- Conversion errors caused by null bytes in
CString::new. os_str()was unable to be properly converted to a UTF-8 string.- The name is too long for the stack based buffer... although I'd argue that
ENAMETOOLONGcovers this well enough already.
Ignoring the third error, let's say we use a one-bit determinant to decide between the first two errors. This leaves us with 14 bits to hold the index at which path processing failed; either the zero in the CString or the invalid UTF-8 cluster. This can hold a maximum index of 1 << 14 = 16384, which is probably high enough for most cases, and we can just use 1 << 14 as a sentinel value for "this length can't be properly represented".
Thoughts on this? We potentially ran into this issue as a part of dbus2/zbus#517, which makes it the second time I've stepped on this particular footgun.