rfcs Range types for integers (or refinement types?)

Issue by BryanQuigley Saturday Dec 13, 2014 at 03:46 GMT

For earlier discussion, see https://github.com/rust-lang/rust/issues/19801

This issue was labelled with: A-an-interesting-project, E-hard in the Rust repository

It seems like a natural extension of how variables (immutable by default, mutable if specified) are defined to allow the programmer to dictate a specific range of allowed values for an integer. If I know a value is only valid between 0-1000 the sooner I declare that the better it is for catching bugs, off by one errors, and more...

I'm not sure what exact syntax would work, maybe:

let mut(0,1000) x = 0i;

x is only valid from 0-1000 inclusive.

(Apologies if this is already possible, I've been parsing the docs trying to learn Rust.)

Jan 21 '15 19:01 steveklabnik

Ada has this. It would be great in Rust, as we're going for the same niche of low-level + high-assurance.

I wonder though if we should go for a more general system, such as refinement types or pre/post-conditions. There have been a number of successes bolting these onto languages like Haskell, F#, and C#. AIUI, they manage this without heavy changes to the core language. The condition checker is a mostly-separate tool that gathers proof obligations from the source code and passes them to a SMT solver.

Basically I think this is an area where we should give researchers and other interested parties some time to experiment before we standardize something into Rust itself.

Jan 21 '15 22:01 kmcallister

Does Ada check the ranges at runtime or at compile time?

I don't think we need a special syntax for this.

Here is a toy implementation of an integer that checks the range at runtime:

use std::num::Int;

#[derive(Copy,Debug)]
struct RangedInt<T: Int> {
    value: T,
    min: Option<T>,
    max: Option<T>,
}

impl<T: Int> RangedInt<T> {
    fn checked_add(self, other: Self) -> Option<Self> {
        let sum = match self.value.checked_add(other.value) {
            None => return None,
            Some(i) => i,
        };
        match self.min {
            Some(i) if sum < i => return None,
            _ => (),
        };
        match self.max {
            Some(i) if sum > i => return None,
            _ => (),
        };
        Some(RangedInt { value: sum, min: self.min, max: self.max })
    }
}

fn main() {
    let a = RangedInt::<i64> { value: 5, min: Some(0), max: Some(10) };
    let b = RangedInt::<i64> { value: 9, min: Some(0), max: Some(10) };
    let ok = a.checked_add(a);
    let fail = a.checked_add(b);
    println!("ok:   {:?}", ok);
    println!("fail: {:?}", fail);
}

yielding

ok:   Some(RangedInt { value: 10, min: Some(0), max: Some(10) })
fail: None

Feb 22 '15 13:02 vks

@vks: I believe Ada's semantics are those of run-time checking, but the compiler can elide checks when it proves they can't fail.

I think this is a pragmatic approach to range checking. You can use the feature today, without understanding a complicated system for getting through the compiler. The code may be slow, but it will get faster over time as the compiler gets smarter, with zero effort on your part.

It sounds nice anyway.

Another approach is to insert dynamic range checks in debug builds only. This fits nicely with the new integer overflow semantics. Release builds would just prepare a report of which ranges were not verified statically. This lets you target unchecked ranges as a code metric to reduce over time.

At any rate, I agree that we don't need special syntax for range types. All we need is to allow macro invocations in type expressions. Then

fn foo(x: range!(0..10)) {

will work whether range! is a macro_rules! macro, a compiler plugin, or a built-in feature with a feature gate.

@freebroccolo has already done some of this work.

Feb 22 '15 20:02 kmcallister

@kmcallister I would much rather have generics over values. Something like RangedInt<i64, 0, 10>.

Feb 22 '15 21:02 vks

The issue with generic types or macros is that rust cannot automatically modify the range when it learns something new about a variable.

let mut(0,1000) x = 0i;
if (x < 100) { return; }
// range of x is now (100..1000)

let (0,1000) x = 0i;
let (0, 1000) y = ...;
let z = x / y;
// range of x is now (1..1000) as otherwise the division would have panicked.

Mar 06 '15 13:03 oli-obk

You can if you define the operators in terms of &mut self.

On Fri, Mar 6, 2015, 14:40 Oliver Schneider [email protected] wrote:

The issue with generic types or macros is that rust cannot automatically modify the range when it learns something new about a variable.

let mut(0,1000) x = 0i;if (x < 100) { return; }// range of x is now (100..1000)

let (0,1000) x = 0i;let (0, 1000) y = ...;let z = x / y;// range of x is now (1..1000) as otherwise the division would have panicked.

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rfcs/issues/671#issuecomment-77559655.

Mar 06 '15 15:03 vks

Not if you implement it with [not yet available] generic values. Sidenote: This is actually similar to typestate.

Mar 06 '15 15:03 oli-obk

I think you can implement it as of now, with checks at runtime though.

On Fri, Mar 6, 2015, 16:22 Oliver Schneider [email protected] wrote:

Not if you implement it with [not yet available] generic values. Sidenote: This is actually similar to typestate.

— Reply to this email directly or view it on GitHub https://github.com/rust-lang/rfcs/issues/671#issuecomment-77575929.

Mar 06 '15 15:03 vks

Definitely check out Sage and the associated papers. Their system uses a hybrid of traditional type checking/inference, a theorem-proving black box, and dynamic checking as a fall-back. In 1,200 lines of (ML-syntax) test code, they have 5,087 subtyping judgements, of which 99.7% are discharged at compile time.

One of the Sage examples is a binary search tree whose type includes the range of valid keys, which is used to guarantee that the tree stays in search order. In Rust-like syntax:

fn Range(lo: isize, hi: isize) -> type {
    those x: isize where lo <= x && x < hi
}

enum BiTree(lo: isize, hi: isize) {
    Empty,
    Node {
        mid: Range(lo, hi),
        left: BiTree(lo, mid),
        right: BiTree(mid, hi),
    },
}

impl BiTree(lo: isize, hi: isize) {
    fn search(&self, v: Range(lo, hi)) -> bool {
        match *self {
            BiTree::Empty => false,
            BiTree::Node { ref mid, ref left, ref right } => {
                match v.cmp(mid) {
                    Ordering::Less => left.search(v),
                    Ordering::Equal => true,
                    Ordering::Greater => right.search(v),
                }
            }
        }
    }

    fn insert(&self, v: Range(lo, hi)) -> BiTree(lo, hi) {
        match *self {
            BiTree::Empty => BiTree::Node {
                mid: v,
                left: BiTree::Empty,
                right: BiTree::Empty,
            }
            BiTree::Node { ref mid, ref left, ref right } if v < *mid => {
                BiTree::Node {
                    mid: mid,
                    left: left.insert(v),
                    right: right,
                }
            }
            BiTree::Node { ref mid, ref left, ref right } => {
                BiTree::Node {
                    mid: mid,
                    left: left,
                    right: right.insert(v),
                }
            }
        }
    }
}

Note that in the dependently-typed setting, there is no need to separate type parameters and value parameters. I have taken various liberties with the syntax, particularly the treatment of implicit function arguments. The syntax

those x: isize where lo <= x && x < hi

represents a refinement type, equivalent to the more traditional

{x: isize | lo <= x && x < hi}

Refinements on fn arguments could use the existing where syntax and don't need the those keyword.

Making Rust dependently typed would be a huge step. But it's a huge step that provides integer generics, variadic generics, refinement types, compile-time function evaluation, and lots of other goodies, in a single unified framework. Potentially it could subsume a lot of the existing compile-time machinery, as well.

cc @bstrie

Apr 18 '15 19:04 kmcallister

The Const generics"-RFC will allow types that are generic over an integer parameter. This will probably be useful for defining ranged types in a library.

May 25 '18 06:05 nielsle

Instead of this syntax:

let mut(0,1000) x = 0i;

I like:

let mut x: u32[.. 1_000] = 0;

Or even:

let mut x = 0_u32[.. 1_000];

Jul 16 '19 16:07 leonardo-m

Once const-generics is completely implemented, const parameters can refer to type parameters and we are able to make bounds based off of const parameters, we will ideally be able to do this as a library like so,

struct Ranged<T: Ord, const LOW: T, const HIGH: T>(T) where LOW <= HIGH;

Right now on nightly we can do

struct RangedU32<const LOW: u32, const HIGH: u32>(u32);

But this will not allow the compiler to take advantage of the unused-space like NonZero* does. If it gets integrated into std then these Ranged* versions can be lang items while NonZero* becomes based off of these new Ranged* types.

Jul 17 '19 00:07 RustyYato

But this will not allow the compiler to take advantage of the unused-space like NonZero* does.

There's a list of various important features that library-defined intervals can't support. So I think they need to be built-ins.

Jul 18 '19 08:07 leonardo-m

Is there an RFC for something along the lines of what @KrishnaSannasi said? This could absolutely allow the compiler to optimize space, and could halve the size of certain structs. If not, I'll look into what's actually necessary for a proper RFC.

Oct 21 '19 05:10 jhpratt

@jhpratt I am not aware of any RFC into this, but I think it would probably be best to hold off on it until const generics is complete on nightly.

Oct 21 '19 16:10 RustyYato

@KrishnaSannasi why do you think pipelining (working in parallel) on such things is a bad idea?

Oct 21 '19 19:10 leonardo-m

@leonardo-m I think that bounded range types should start off as a crate so we can try out different apis, and we can't really do that right now (no expression handling in const generics).

Oct 21 '19 21:10 RustyYato

@KrishnaSannasi One of the key advantages of range bounded integers is the ability for smaller representation, which isn't possible without rustc integration.

Oct 21 '19 21:10 jhpratt

@jhpratt but the api we use to work with the bounded integer types don't depend on the compiler, so they can be worked on as a seperate crate. Also, with complete expressions in const generics, you could implement the space optimizations without compiler integration just by using Rust's powerful type system. Finally, bounded integers are useful even without that optimisation, so it would be fine to just have a crate.

Oct 21 '19 23:10 RustyYato

Yes, most of the API would be possible to implement as a crate, though treating it as a primitive would not be. Ideally, I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into() (or even perform runtime checks, as the compiler could check this simple case).

How would space optimization be enabled? int<1, 12> only needs 4 bits to store the information, but the backing type would be at the minimum eight. You could get around Option<int<1, 12>> by using specialization, but that gets complicated very quickly. The orphan rule might also have something to say about that, but I'm not 100% sure on that.

It's certainly useful, and something I may look into implementing to an extent myself at some point. It's just that it would be far more useful to have it built-in.

Oct 21 '19 23:10 jhpratt

It's certainly useful, and something I may look into implementing to an extent myself at some point. It's just that it would be far more useful to have it built-in.

Yes, this is a valid reason to merge an external crate into std. But this is not a sufficient reason to start it in std.

I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into() (or even perform runtime checks, as the compiler could check this simple case)

If there were runtime checks for let x: int<1, 12> = 7; in the generated assembly, I would be very concerned about all of Rust's zero-cost abstractions. This seems trivial to optimize out. Having the ability to not call into is some sugar is really nice, and even more reason to put them in std once we have a viable crate.

How would space optimization be enabled?

We could round up to the nearest byte (which would already be a large space saving) and use traits like those in typenum to pick which backing integer type to use.

You could get around Option<int<1, 12>> by using specialization, but that gets complicated very quickly.

You can't specialize layouts like this without compiler integration.

One thing about specializing layouts that I didn't see brought up. It would make multiplication and division much more costly. Assuming that you are only storing the offset from the lower bound, every multiplication and division would have an additional addition folded in to offset back to the correct value. Is that worth the space savings? Is there some clever way to not do this? Shifting back may not be possible, because the multiplication may overflow u128, in that case you could use other methods to multiply or divide, but it gets messy.

I bring this up because it doesn't look like anyone has discussed this here. Also, how do other languages implement this? Do any other languages implement this as a library?

Oct 22 '19 00:10 RustyYato

Ideally, I'd be able to do let x: int<1, 12> = 7; and it would just work without having to call .into()

Note that this is also true for other things like NonZeroU32 and BigInteger, so to me that argues for a "custom literals" feature, not for "range integers need to be in core".

Oct 24 '19 07:10 scottmcm

If rust would adopt refinement types for example over where constraints, would it guarantee for a decidable fragment?

Nov 03 '19 13:11 sighoya

Is anything like this in the language yet?

Apr 25 '20 05:04 khoek

It's probably best to experiment at the library level first, and that in turn requires const generics if you're not in the mood for fighting typenum.

Apr 25 '20 06:04 HadrienG2

so to me that argues for a "custom literals" feature, not for "range integers need to be in core".

We should make a list of all the things a library defined ranged value can't do compared to a well designed built-in feature. I think the number of sub-features that should to be build-in is limited, but they are important enough to make them worth using often.

May 21 '20 12:05 leonardo-m

We'll eventually want compiler plugins (forks?) for various "slow" static analysis task, including refinement types, but presumably only after chalk.

It'd also help prevent them from interacting with soundness. Refinements must not touch dispatch obviously. If refinements tried influencing dispatch, then they'd wind up inside some much longer purgatory than the mere 5 years required by specialization. ;)

An optional compile phase that rejects some programs when called, but cannot modify behavior sounds cleanest. And developers would prefer such phases being invoked only by CI.

May 21 '20 16:05 burdges

only by CI?

May 21 '20 21:05 Lokathor

Just for anyone that may be thinking of doing the same thing — I intend to begin work on a basic implementation of ranged integers on nightly. With min_const_generics looking likely to stabilize in February 2021, it's probably best to get out ahead of it.

Nov 25 '20 10:11 jhpratt

@jhpratt Great to hear it! I look forward to seeing how it goes.

The piece I keep looking forward to, though -- Integer<A,B> + Integer<C,D> => Integer<{A+C},{B+D}> -- isn't in min_, sadly.

Nov 25 '20 18:11 scottmcm