[RFC] dependent types, restricted types or generalization of the range types
Nim currently has the concept of the range type that is used actively to express entities like positive number and strictly positive number. Unfortunately, it is restricted only to ordinal and float types. I would like to generalize this idea to cover more use cases and express existing range types in more generic terms.
Example of use cases
This proposal aims to be quite generic and cover existing range types and many more.
Random examples:
-
Not empty sequence or sorted sequence as a type. Sorted seq sequence or non empty sequence is still a sequence hence distinct is not useful way to describe it. Functions working with any sequence should still work with sorted/non empty sequence.
-
Object with two date fields and one date must always be greater than another one. Object contains 2 sequences but their lengths should always be equal.
The implementation proposal
The idea to add type new type bound operation =check that can be attached to any type that can introduce arbitrary restriction on actual values. Signature:
proc `=check`(x: MyCheckedType)
=check proc has one argument and no return type, but proc can raise exception or defect if value is not within range. =check can be attached to the generic types too:
proc `=check`[T](x: MyCheckedType[T])
Check for constant values should be done at compile time. For runtime values =check calls are injected. Transformation phase of Nim will inject =check calls in the right places.
In similar fashion what it is currently done for range types.
Type coercion rules
-
Checked type is implicitly convertible to its underlying type.
-
Underlying type is implicitly convertible to checked type with
=checkcall injected. Const values trigger immediate=checkevaluation. -
Two checked types with the same underlying non distinct type are implicitly convertible to each other with
=checkcall injected. Const values trigger immediate=checkevaluation. -
Checked type argument match is a better match then underlying type.
Examples
With this proposal range types can now be implemented completely in the user case, extra compiler support no longer required:
type range*[S: static[Slice]] = S.T
func `=check`[S](x: range[S]) =
when nimvm or compileOption("boundChecks"):
if x < S.a or x > S.b:
sysFatal(RangeError, "...")
func low*[S](x: range[S]): range[S] = S.a
func high*[S](x: range[S]): range[S] = S.b
Natural type integer can be defined as:
type Natural = range[0..high(int)]
type PositiveFloat = range[0.0..high(float)]
This will work now too, while previously it didn't:
const myarray = [1, 2, 3]
type MyIndex = range[0..<len(myarray)] # ..< is not supported in existing nim devel
Above example with non empty sequence, can be implemented as:
type NonEmptySeq[T] = seq[T]
proc `=check`[T](x : NonEmptySeq[T]) =
when nimvm or compileOption("boundChecks"):
if x.len > 0:
sysFatal(ValueError, "...")
Summary of the changes in the compiler:
-
Ast nodes removed as no longer required: nkChckRangeF, nkChckRange64, nkChckRange
-
transf phase will inject
=checkcalls where necessary. For const expressions=checkcalls will be evaluated at compile time in semfold. -
tyRange type is removed along with a handful of code in the sigmatch, semtypes. Some of the code will be reused as
=checkinjections needs to be done in the same places as current nkChckRange checks. -
Some magics like mInc, mDec, mAddI, mSubI, mSucc, mPred will need
=checkinject on its result value. Transf will do this work too. -
Range type is now implemented in the user space as in the example above, still part of system.nim.
-
No new type kind is required to be introduced. tyAlias can be used to attach check operation to. Looks like tyAlias in the best candidate for this work, here is why. When type is introduced you don't know if it wil be checked or not:
type NonEmptySeq[T] = seq[T]It will be tyAlias after sempass of type section. Only after the
=checkoperation will be discovered you will know it is a checked type. You will have two options: leave tyAlias kind as is or change its kind to new tyChecked on=checkoccurrence. I don't know which one is better, let me know if you have strong opinion. I am leaning towards first option leave tyAlias as is since tyAlias also works well with type coercion rules specified above.
Further details
=check can be called explicitly by serialization libraries to trigger check of deserialized values to
match code expectation. Calling =check proc for types that don't have =check operation defined is simply noop.
For object types that consist of many fields, it is not uncommon to initialize object element by element:
var a: MyObject
a.field1 = ...
a.field3 = ...
a.field3 = ...
This code won't work if MyObject is a checked type. As =check will be called too early on line var a: MyObject. This problem intersects with RFC 49 (https://github.com/nim-lang/RFCs/issues/49). The idea is to use existing pragma noInit to delay or remove object is Initialized check to first object usage or result return instead:
var a {.noInit.}: NonEmptySeq[int] # no check on this line
I see 2 ways of how noInit can be implemented: =check and is initialized checks are ignored completely or the checks are delayed to the object usage.
Assigning fields of the object, doesn't count as object usage.
Caveats
This proposal makes active use of static[T] types. static[T] bugs gets more apparent.
I managed to hit some of the issues during preparation of this proposal.
This is an interesting proposal, but I don't think it will work out with regular type aliases. Nim has a lot of internal code that tries to eliminate the intermediate named types in a long chain of type aliases. The spec defines that an alias of a type should be considered the same type in every possible sense.
What could work better is to model the checked types as distinct types, while introducing additional general conveniences for creating distinct types with implicit conversions. At some point in the past, Nim had an undocumented support for the following nicer syntax for distinct types:
type
ImmutableString = distinct string without `[]=`, `add`
ID = distinct int with `>`
CheckedSeq = distinct seq with *
It's easy to add additional syntax sugar for listing invariants (the checks) and for controlling the implicitness of the conversions.
In general, this functionality is closely related to another idea of having "flow-dependent" types. As a simple example, consider code like this:
var res = getSomeOption()
if res.isSome:
echo res.get
A type system that exploits flow-dependent types will be able to notice that the res variable is proven to hold a value in the branch of the code where res.get is called and thus no further type checks have to be inserted. To achieve this property, you need a type system that is be able to reason about known constraints and predicates. Ada Spark for example is able to do this by relying on a SMT solver such as Z3.
There are many possible approaches to design such a feature - from automated ones where the compiler is deriving its knowledge from the written code to more formalized ones where the programmer must use some kind of language for combining and managing the "proofs" for the properties of interest being expressed. One has to be very careful to strike the right balance between ergonomics (simplicity), efficiency and expressive power.
Moving to RFC repo.
edit: but I can't, I don't have rights there :/
Hi Mratsim, I can't transfer it either. RFCs repo is not in the list of for transfer. Can I somehow give you the permission?
IMO a compiler option to globally disable runtime dependent typechecks might be prudent.
For @zah's point about the type aliases: perhaps checkable types should have a pragma {.checkable.} to them, preventing them from being alias-able, as one can still make aliases to distinct types.
Alternatively, permit type aliases like we already do, but specify that check procedures must be applied to a "real" type, and not an alias to one.
This RFC is stale because it has been open for 1095 days with no activity. Contribute a fix or comment on the issue, or it will be closed in 7 days.
I'd like to see this reopened, ranges currently have problems this RFC would address.
To keep discussion fresh and up to date it might be better to remake as a new RFC in general
Sounds good. This RFC was not actually about dependent types but a runtime-focused refinement types system, FWIW.
I suppose these would be somewhat of a dual to (old) concepts: concepts are compile-time predicates while these checked types are primarily runtime. I'm going to mull over if it would be worth it to unify these: perhaps as =check and =staticcheck hooks? Though a better approach may be based on concepts...
Rust has somewhat of an equivalent feature being played around with (https://github.com/rust-lang/rfcs/issues/671), though I have my doubts it'll make it into core being such a runtime burden. I strongly suspect that static verifiers eg. Prusti will be pushed for instead.
I think that there's a really interesting possibility to leverage DrNim for such a feature. Proper dependent types are almost certainly not useful for a language like Nim (I don't think they're terribly useful outside of anything but theorem proving: though they are very cool) but a good system of refinement types could be leveraged to remove bounds checks at the cost of (dramatically) increased compile times.