x86_64 Define `CheckedAdd` and `CheckedSub` traits

There are a few places where I've seen, or they could benefit from a, checked_add and checked_sub implementations throughout the library, such as on VirtAddr, PhysFrame, PhysAddr... etc. I would like to define a CheckedAdd and CheckedSub trait similar to the std::ops::Add trait that defines a checked_* method.

pub trait CheckedAdd<Rhs = Self> {
    type Output;

    fn checked_add(self, rhs: Rhs) -> Option<Self::Output>;
}

pub trait CheckedSub<Rhs = Self> {
    type Output;

    fn checked_sub(self, rhs: Rhs) -> Option<Self::Output>;
}

I would like to add this specifically to provide for checked_operations across multiple Rhs types. An example being a CheckedAdd<Rhs=PhysFrame> for PhysFrame and CheckedAdd<Rhs=PhysAddr> for PhysFrame.

Aug 13 '21 13:08 ncatelli

Alternatively, num::CheckedAdd looks to have been implemented with a slightly different interface, but can provide something similar if the additional dependency is okay.

I'd be happy to draft up an example PR for either if this is an agreeable change.

Aug 13 '21 13:08 ncatelli

Example branch: https://github.com/ncatelli/x86_64/tree/expermiment/293/add-checked-arithmetic-operations-traits

Aug 13 '21 16:08 ncatelli

I would definitely prefer using an existing trait definition rather than providing our own. I don't think depending on num-traits (without default features) would be a problem. However, the interface is wrong. For CheckedAdd we need to have the Rhs be of type u64 (in the num crate, Rhs == Self). It doesn't make sense to add two VirtAddrs together. Similarly for CheckedSub we need to have the return value be u64 (the interface in the num crate returns Self). So looks like we will have to provide our own traits.

I do think that we should have the same/similar traits implemented for all our "address" types:

VirtAddr: an address that must be cannonical
PhysAddr: an address that must be < 2^()
Page: a PageSize::SIZE aligned VirtAddr
PhysFrame: a PageSize::SIZE aligned PhysAddr

Here's a list of all the traits we either do implement or should implement:

[x] Clone, Copy, PartialEq, Eq, PartialOrd
- Implemented by automatic Derive
[x] Debug, Binary, LowerHex, UpperHex, Octal, Pointer
- Just display traits. Used for pretty printing like: println!("{:p}", virt_addr);
[x] Ord: Requires a total order
- We currently implement this today via Derive
- Should we be implementing this for VirtAddr? Does it make sense to compare and address in the upper-half and lower-half? Should the range lower_half_addr..upper_half_addr be empty or ill formed? Should "upper-half" addresses be viewed as negative virtual addresses (they are sign extended)?
[ ] CheckedAdd<Rhs = u64, Output = Self> ~~(also with usize if on 64-bit)~~
[ ] CheckedSub<Rhs = u64, Output = Self> ~~(also with usize if on 64-bit)~~
[ ] CheckedSub<Rhs = Self, Output = u64>
[x] Add<Rhs = u64, Output = Self>, Sub<Rhs = u64, Output = Self>, Sub<Rhs = Self, Output = u64>
- Can be implemented using the checked versions
- Breaking Change: Remove the usize Add/Sub impls
[ ] Step (makes ranges iterable)
- Implemented using the CheckedAdd and CheckedSub implementations
- See #212 and #292

There is also the question of conversions. This was brought up in #268. Right now we don't implement any, here is what we could do:

[ ] VirtAddr: TryFrom<u64>, PhysAddr: TryFrom<u64> (and also for usize)
[ ] VirtAddr: From<u32>, PhysAddr: From<u32>
[ ] VirtAddr: TryFrom<*const T>, VirtAddr: TryFrom<*mut T> (raw pointers might be non-cannonical)
[ ] *const T: From<VirtAddr>, *mut T: From<VirtAddr>
[ ] VirtAddr: From<&T>, VirtAddr: From<&mut T> (only on x86_64, do we know if references are cannonical?)
[ ] u64: From<VirtAddr>, u64: From<PhysAddr>
[ ] ~~Page: TryFrom<VirtAddr>, PhysFram: TryFrom<PhysAddr> (addresses might not be aligned)~~
[ ] ~~VirtAddr: From<Page>, PhysAddr: From<PhysFrame>~~
This would allow us to deprecate some constructors and our as_* methods.
Existing issue: #268

@phil-opp @ncatelli @budde25 which of these traits do you think we should implement?

Aug 19 '21 10:08 josephlr

I've drafted the above PR, #298, with an initial implementation of checked ops with an example applied to VirtAddr. If the trait implementation is in an agreeable location and signature I'm happy to proceed with implementing for the other types listed.

Aug 19 '21 18:08 ncatelli

Should we be implementing this for VirtAddr? Does it make sense to compare and address in the upper-half and lower-half? Should the range lower_half_addr..upper_half_addr be empty or ill formed? Should "upper-half" addresses be viewed as negative virtual addresses (they are sign extended)?

I would treat them as normal unsigned integers (with a gap in the middle). So upper half address > lower half address and treat the max_lower_half_addr..min_upper_half_addr range as empty (there are no valid addresses in that range). This way, we should be forward compatible with Intel's 5-level paging, in case we add support for it at some point.

CheckedAdd<Rhs = u64, Output = Self> (also with usize if on 64-bit)

I think one problem with implementing it for both u64 and usize is that you have to type-annotate all integers. For example, virt_addr + 1 no longer works, only virt_addr + 1u64 or virt_addr + 1usize, which can be a bit annoying.

[ ] VirtAddr: TryFrom<u64>, PhysAddr: TryFrom<u64> (and also for usize)

[ ] VirtAddr: From<u32>, PhysAddr: From<u32>

[ ] VirtAddr: TryFrom<*const T>, VirtAddr: TryFrom<*mut T> (raw pointers might be non-cannonical)

[ ] VirtAddr: From<&T>, VirtAddr: From<&mut T> (only on x86_64, do we know if references are cannonical?)

[ ] u64: From<VirtAddr>, u64: From<PhysAddr>

These all seem reasonable. Yes, I think that we can assume that references are canonical on x86_64 in general. The question is how we want to treat the upcoming 5-level paging? We could panic on conversion in that case, but I'm not sure if panicking in From impls is encouraged.

[ ] Page: TryFrom<VirtAddr>, PhysFram: TryFrom<PhysAddr> (addresses might not be aligned)

[ ] VirtAddr: From<Page>, PhysAddr: From<PhysFrame>

Not sure if I like these ones because there are multiple ways to do these conversions. For example, for the Page->VirtAddr direction you could be interested in either the start or end address of the page. And the VirtAddr->Page direction loses some information (the page offset). In combination, this can make a VirtAddr->Page->VirtAddr conversion result in a different VirtAddr than before, which could lead to accidental errors.

Aug 21 '21 10:08 phil-opp

I would treat them as normal unsigned integers (with a gap in the middle). So upper half address > lower half address and treat the max_lower_half_addr..min_upper_half_addr range as empty (there are no valid addresses in that range). This way, we should be forward compatible with Intel's 5-level paging, in case we add support for it at some point.

How would we want to handle an arbitrary range that crossed the gap? Also have it be empty? Automatically jump the gap? Panic?

I think one problem with implementing it for both u64 and usize is that you have to type-annotate all integers. For example, virt_addr + 1 no longer works, only virt_addr + 1u64 or virt_addr + 1usize, which can be a bit annoying.

So I think this is also a problem for the Add implementations, as well as the potential CheckedAdd implementations. However, I agree. It would be nice to have the type inference work properly. I think we should:

Only implement CheckedAdd for u64
As a breaking change: remove the Add/Sub implementations for usize.

These all seem reasonable. Yes, I think that we can assume that references are canonical on x86_64 in general. The question is how we want to treat the upcoming 5-level paging? We could panic on conversion in that case, but I'm not sure if panicking in From impls is encouraged.

Personally, I think the panic is fine (for now). Before we (potentially) support 5-level paging, we can just have a disclaimer that is like: "if you use 5-level paging, some methods might panic, but safety will not be violated".

Not sure if I like these ones because there are multiple ways to do these conversions. For example, for the Page->VirtAddr direction you could be interested in either the start or end address of the page. And the VirtAddr->Page direction loses some information (the page offset). In combination, this can make a VirtAddr->Page->VirtAddr conversion result in a different VirtAddr than before, which could lead to accidental errors.

Agreed, that's a very good point. Especially because we already have methods for these conversions, we don't need to implement From for them. I'll remove them from the list. I also added some conversions to the list for pointer->VirtAddr conversion.

Aug 22 '21 09:08 josephlr

How would we want to handle an arbitrary range that crossed the gap? Also have it be empty? Automatically jump the gap? Panic?

Good question. Given that the methods of the Step trait are defined as a successor/predecessor operations, I think they should return the next larger/smaller valid value. So I'm in favor of automatically jumping the gap for ranges. This would be similar to how the standard library does it for char (i.e. skip codes that are not valid characters). In addition to preventing unexected panics, this approach has the advantage that half-open ranges (e.g. offset..) and full ranges (..) remain usable, e.g. for methods like proposed in https://github.com/rust-osdev/x86_64/pull/266.

We still need to discuss how this should interact with the Add/Sub operations, though. Right now, we're kind of using Add as a successor operation too (i.e. jump the gap), but #299 proposes to change this to a normal arithmetical Add operation (i.e. just increase the address and panic if it's no longer canonical). For VirtAddr, I think that it could make sense to have both Add and step operations, but we need to clearly document the differences.

For Page and PhysFrame, on the other hand, I'm not sure if it makes sense to provide both Add and step operations in the long term (when both are usable on stable). We don't really use the concept of a page or frame number in this crate, so I think there's less demand for a "give me the page with number x + 4" operation. Instead, the common operation is e.g. "give me the next 10 pages", so maybe it makes sense to only provide the step operations, but not Add/Sub. This would also solve the "bytes or page number?" confusion that #288 tries to prevent. One drawback of this is that an Add implementation is much easier to use (there is no successor operator in Rust).

Whatever we decide, I think it is important that our implementations are consistent with each other: If our Add impl for VirtAddr does not jump the gap, the Add impl for Page should not do it either.

So I think this is also a problem for the Add implementations, as well as the potential CheckedAdd implementations. However, I agree. It would be nice to have the type inference work properly. I think we should:

Only implement CheckedAdd for u64

As a breaking change: remove the Add/Sub implementations for usize.

I didn't remember that we already have an Add impl for usize. Does it lead to problems in practice?

Aug 22 '21 12:08 phil-opp

I didn't remember that we already have an Add impl for usize. Does it lead to problems in practice?

It means you can't write code like the following:

let addr1 = PhysAddr::new(0x52);
let addr2 = addr1 + 3;

although you get get a reasonable compiler error:

error[E0283]: type annotations needed
   --> src/addr.rs:797:27
    |
797 |         let addr2 = addr1 + 3;
    |                           ^ cannot infer type for type `{integer}`
    |
note: multiple `impl`s satisfying `addr::PhysAddr: Add<{integer}>` found
   --> src/addr.rs:552:1
    |
552 | impl Add<u64> for PhysAddr {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^
...
568 | impl Add<usize> for PhysAddr {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I think having the usize impl helps more than it hurts, especially on 32-bit platforms dealing with x86_64 structures.

Mar 26 '22 05:03 josephlr