derive_more icon indicating copy to clipboard operation
derive_more copied to clipboard

Fix incorrect `fmt::Pointer` implementations

Open tyranron opened this issue 1 year ago • 9 comments

Synopsis

Debug and Display derives allow referring fields via short syntax (_0 for unnamed fields and name for named fields):

#[derive(Display)]
#[display("{_0:o}")]
struct OctalInt(i32);

The way this works is by introducing a local binding in the macro expansion:

let _0 = &self.0;

This, however, introduces double pointer indirection. For most of the fmt traits, this is totally OK. However, the fmt::Pointer is sensitive to that:

#[derive(Display)]
#[display("--> {_0:p}")]
struct Int(&'static i32);

// expands to
impl fmt::Display for Int {
    fn fmt(&self, f: fmt::Formatter<'_>) -> fmt::Result {
        let _0 = &self.0; // has `&&i32` type, not `&i32`
        write!(f, "--> {_0:p}") // so, prints address of the `_0` local binding, 
                                // not the one of the `self.0` field as we would expect
    }
}

Solution

Instead of introducing local bindings, try to replace this shortcuts (like _0) in idents and expressions with a field accessing syntax (like self.0). Or event just substitute _0 with (*_0), because for enum variants we cannot really access the fields via self..

Checklist

  • [x] ~~Documentation is updated~~ (not required)
  • [x] Tests are added/updated
  • [ ] CHANGELOG entry is added

tyranron avatar Dec 23 '23 02:12 tyranron

@tyranron are you still planinng on addressing this?

JelteF avatar Feb 20 '24 21:02 JelteF

@tyranron ping

JelteF avatar Mar 11 '24 09:03 JelteF

@JelteF yes, sorry for not being responsive. I was a little bit overwhelmed on my job last 2 months. I'll try to finish this in next 2 weeks or so.

tyranron avatar Mar 11 '24 14:03 tyranron

No worries, it happens. "Next 2 weeks or so" sounds amazing.

JelteF avatar Mar 11 '24 14:03 JelteF

Oh... crap. This is a bit of a rabbit hole.

I've figured out the solution for cases like #[display("{_0:p}")] or #[display("{:p}", _0)]. But cannot figure out the solution for complex expressions like #[display("{}", format!("{_0:p}"))], so at the moment #[display("{_0:p}")] and #[display("{}", format!("{_0:p}"))] won't produce the same result, and I cannot get how to solve it without going into deep parsing of expressions, which is non-trivial and will require full feature of syn.

We basically hit the difference between accessing a binding and a self.field Rust semantics. We want a binding to always behave like a self.field, but that's not possible in Rust without moving out. And replacing a binding with a *binding in arbitrary expressions seems non-trivial, if solvable in general case at all.

@JelteF I think that supporting #[display("{_0:p}")] and #[display("{:p}", _0)] cases is enough. For more complex situations like #[display("{}", format!("{_0:p}"))] it's better to describe field accessing semantics better in docs and warn explicitly about possible pointer arithmetic or formatting implications.

tyranron avatar Apr 05 '24 17:04 tyranron

@JelteF thinking about it more, I'm unsure whether we should fix #[display("{_0:p}")]/#[display("{:p}", _0)] at all, since we cannot fix it in all the cases. Because having different behavior with #[display("{}", format!("{_0:p}"))] is just too subtle.

Instead, I propose to leave desugaring as it is now, and:

  1. Describe in docs clearly that _0 and field_name has pointer semantics, not value semantics, giving a clear example of how {:p} formatting should be used correctly.
  2. For trivial detectable cases like #[display("{_0:p}")]/#[display("{:p}", _0)] produce a compilation error, guiding towards correct usage: #[display("{:p}", *_0)].

This way, throwing the compile-time error, we bring the library user's attention to this edge case, so he will much more aware about the implications for the #[display("{}", format!("{_0:p}"))] case, rather than silently introducing a different behavior (being the wrong one) without warning the library user about this in any way.

tyranron avatar Apr 08 '24 15:04 tyranron

tl;dr I like your previous proposal much better.

Honestly, I cannot think of a reasonable case where anyone would do something like this: #[display("{}", format!("{_0:p}"))]. So I think optimizing for that case seems a bit strange. Because that means not being able to use #[display("{_0:p}")], which seems a much more common situation.

I think it makes sense to document the weirdness around this, but I think we should support #[display("{_0:p}")] and #[display("{:p}", _0)].

JelteF avatar Apr 08 '24 16:04 JelteF

@JelteF by the way, despite me being too busy to get on this lately, it appears for the better... today, when I was dreaming, I figured out the solution I do like and that should work in all the cases:

// We declare in `derive_more` additional types machinery:
#[repr(transparent)]
pub struct PtrWrapper(T);

// We do the necessecarry dereference in the `fmt::Pointer` impl for our `PtrWrapper`
impl<T: fmt::Pointer> fmt::Pointer for PtrWrapper(&T) {
    fn fmt(&self, f: fmt::Writer<'_>) -> fmt::Result {
        (*self.0).fmt(f)
    }
}

// For any other `fmt` trait we just do the transparent implementation.
impl<T: fmt::Debug> fmt::Debug for PtrWrapper(T) {
    fn fmt(&self, f: fmt::Writer<'_>) -> fmt::Result {
        self.0.fmt(f)
    }
}

// And also `Deref`+`DerefMut`.

// And in the macro expansion we do this:
let _0 = ::derive_more::fmt::PtrWrapper(&self.0);
// instead of this:
let _0 = &self.0;
// so here, wherever the `_0` value is engaged, it will be dereferenced once used as `{:p}`
write!(f, "--> {_0:p}")
// and any other cases should work just fine due to transparent impls of `fmt` trait and `Deref`/`DerefMut`.

tyranron avatar May 10 '24 13:05 tyranron

@tyranron that sounds like a great solution. And I think you wouldn't even need to implement fmt::Debug (and the other non fmt::Pointer traits in fmt), as long as you implement Deref for this type.

JelteF avatar May 17 '24 10:05 JelteF

@JelteF by the way, despite me being too busy to get on this lately, it appears for the better... today, when I was dreaming, I figured out the solution I do like and that should work in all the cases:

Okay I tried this out just now, and it has an unsolvable problem afaict. By wrapping the reference in PtrWrapper it loses all trait implementations that are automatically implemented on &T (of which there are many). We can implement all the fmt traits ofcourse like you proposed. But that only works for those traits. If the user uses the variable in a way that doesn't cause an auto-deref then this won't help.

So I think the only reasonable solution is to special-case #[display("{_0:p}")] and #[display("{:p}", _0)] and document the strangeness for this thing.

JelteF avatar Jul 01 '24 10:07 JelteF

@JelteF

If the user uses the variable in a way that doesn't cause an auto-deref then this won't help.

So I think the only reasonable solution is to special-case #[display("{_0:p}")] and #[display("{:p}", _0)] and document the strangeness for this thing.

For me, this situation seems to be less confusing than special-casing, because auto-deref should work for the majority of cases, while the special-casing breaks whenever there is any expression. Also, wrapper type doesn't allow the code to be incorrect in a subtle way, like the special-casing does.

tyranron avatar Jul 01 '24 12:07 tyranron

For me, this situation seems to be less confusing than special-casing, because auto-deref should work for the majority of cases

I think a fairly common case that would be much more difficult is doing some arithmetic with one of the arguments:

use ::core;
use core::fmt::{Debug, Formatter, Result, Write, Pointer};
use core::prelude::v1::*;


#[repr(transparent)]
pub struct PtrWrapper<T> (pub T);

impl<T: Pointer> Pointer for PtrWrapper<&T> {
    fn fmt(&self, f: &mut Formatter<'_>) -> Result {
        (*self.0).fmt(f)
    }
}

impl<T> ::core::ops::Deref for PtrWrapper<&T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.0
    }
}

fn main() {
    let a : i32 = 123;
    let a = PtrWrapper(&a);

    println!("{:?}", a * 2)
}

Instead of that "just working" you now get the following error, which doesn't even give the hint that dereferencing might help:

error[E0369]: cannot multiply `PtrWrapper<&i32>` by `{integer}`
   --> src/main.rs:27:24
    |
27  |     println!("{:?}", a * 2)
    |                      - ^ - {integer}
    |                      |
    |                      PtrWrapper<&i32>
    |
note: an implementation of `Mul<{integer}>` might be missing for `PtrWrapper<&i32>`
   --> src/main.rs:7:1
    |
7   | pub struct PtrWrapper<T> (pub T);
    | ^^^^^^^^^^^^^^^^^^^^^^^^ must implement `Mul<{integer}>`
note: the trait `Mul` must be implemented
   --> /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/arith.rs:317:1
    |
317 | pub trait Mul<Rhs = Self> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^

while the special-casing breaks whenever there is any expression.

It really only impacts the Pointer formatter, everything else is not impacted. I think using the Pointer formatter is rare enough that we shouldn't make the ergonomics for other formatters worse.

Also, wrapper type doesn't allow the code to be incorrect in a subtle way, like the special-casing does.

That is true, but again. This only impacts the Pointer formatter. And really I cannot think of any reason to use the Pointer formatter for anything else than one of the fields, or self. So if we special case those, then I don't think anyone will ever get subtly incorrect code. Also worst case, they print the wrong pointer, which is unlikely to be a life-or-death mistake.

JelteF avatar Jul 01 '24 20:07 JelteF

Closing this one in favor if #381

JelteF avatar Jul 04 '24 15:07 JelteF

@JelteF sorry for bringing this up again, but I have more thoughts about it.

This only impacts the Pointer formatter.

We could try leveraging this fact.

Does the following algo makes sense?

  1. We detect all the placeholders using :p, which is trivial.
  2. We wrap all the expressions going into these placeholders with PtrWrapper:
    • for positional arguments (like ("{:p}", whathever)) this is trivial by expanding to ("{:p}", PtrWrapper(whathever))
    • for named arguments outside the formatting expression (like ("{whathever:p}")) this is a bit trickier, because we need to introduce an additional binding in the scope like let _p_whathever = PtrWrapper(whatever); and substitute the name in the expansion to it ("{_p_whathever:p}").

This way, we use the PtrWrapper in a fine-grained way only for the fmt::Pointer formatting, without involving it into operations with other traits at all. Any complex expression (for any reason the caller might want it for :p formatting) goes inside the PtrWrapper() anyway, so we do not mess up its semantic in any way.

tyranron avatar Jul 25 '24 15:07 tyranron

Okay I played around with that idea a bit more, but I cannot get it to do the thing you say it should do for anything but the most basic expression.

The following for instance fails to compile, because there's now a PtrWrappper around the *const i32 type.

use std::fmt;

#[repr(transparent)]
pub struct PtrWrapper<T> (pub T);

impl<T: fmt::Pointer> fmt::Pointer for PtrWrapper<&T> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        (*self.0).fmt(f)
    }
}

impl<T> ::core::ops::Deref for PtrWrapper<&T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        self.0
    }
}

fn main() {
    // a is the field
    let a: &i32 = &123;
    let a = &a; // this is our local variable
    println!("{:p}, {:p}", a, unsafe { (*a as *const i32).add(1) });
    println!("{:p}, {:p}", PtrWrapper(a), PtrWrapper(unsafe { (*a as *const i32).add(1) }));
}

the error:

error[E0277]: the trait bound `PtrWrapper<*const i32>: Pointer` is not satisfied
   --> src/main.rs:25:43
    |
25  |     println!("{:p}, {:p}", PtrWrapper(a), PtrWrapper(unsafe { (*a as *const i32).add(1) }));
    |                     ----                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `Pointer` is not implemented for `PtrWrapper<*const i32>`
    |                     |
    |                     required by a bound introduced by this call
    |
    = help: the trait `Pointer` is implemented for `PtrWrapper<&T>`
note: required by a bound in `core::fmt::rt::Argument::<'a>::new_pointer`
   --> /playground/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/fmt/rt.rs:132:31
    |
132 |     pub fn new_pointer<'b, T: Pointer>(x: &'b T) -> Argument<'_> {
    |                               ^^^^^^^ required by this bound in `Argument::<'a>::new_pointer`
    = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)

Unless you can get this working, I don't really see the point of adding this. The downsides it brings (stuff not compiling and being more complicated to explain) seem worse than the benefits (not having to put a * in front of the variable).

JelteF avatar Jul 25 '24 21:07 JelteF

for named arguments outside the formatting expression (like ("{whathever:p}")) this is a bit trickier, because we need to introduce an additional binding in the scope like let _p_whathever = PtrWrapper(whatever); and substitute the name in the expansion to it ("{_p_whathever:p}").

For named arguments there would be no need to use PtrWrapper afaict, we can simply use the approach I used in #381

JelteF avatar Jul 25 '24 21:07 JelteF

Finally, I even if you can get PtrWrapper to actually work. I think it's easier to explain to people that in the expressions the field names are references to the actual fields and that thus to get to the underlying pointer you have to dereference. If they only have to that for complex expressions (like my unsafe + cast to pointer), but not for simple expressions then I think it's just going to be confusing.

JelteF avatar Jul 26 '24 08:07 JelteF

@JelteF

If they only have to that for complex expressions (like my unsafe + cast to pointer), but not for simple expressions then I think it's just going to be confusing.

Yeah... still imperfect. In complex expressions, better people realize they're working with a reference rather than a direct value. Trying to hide that fact will lead to very subtle bugs if we cannot strip out that additional indirection in all the cases. And we cannot, because we cannot overload "reading the value" in Rust in any way.

tyranron avatar Jul 30 '24 10:07 tyranron