fslang-suggestions
fslang-suggestions copied to clipboard
Make Units of Measure available to more types
Currently it is only possible tag numeric values with a UoM, but in many scenarios it would desirable to be able to differentiate between different types of strings, booleans etc. A good example would be the FilePath type alias:
[<Measure>] type FilePath
let lsDir (path: string<FilePath>) = failwith "nope"
Currently, there are multiple ways to implement this. One is to either wrap the string in question in some struct or record, as suggested here. However, this approach incurs the usual run-time overhead, thus the same as defining a proper wrapper type with wrap/unwrap functions.
Another way is to achieve this is outlined in this snippet (or here). This does not incur any overhead (by erasing the type tag at compile time) and appears to be the canonical solution accepted in the community. The main drawback is its relatively unsafe and unintuitive, especially from a beginners perspective.
Pros and Cons
The advantages of making this adjustment to F# are:
- consistency: why have a
int<cm>, but notbool<lightswitch>orstring<filepath>? - compile-time safety without associated run-time costs, who would not like more of that?
The disadvantages of making this adjustment to F# are:
- its work
Affadavit (must be submitted)
Please tick this by placing a cross in the box:
- [x] This is not a question (e.g. like one you might ask on stackoverflow) and I have searched stackoverflow for discussions of this issue
- [x] I have searched both open and closed suggestions on this site and believe this is not a duplicate
- [x] This is not something which has obviously "already been decided" in previous versions of F#. If you're questioning a fundamental design decision that has obviously already been taken (e.g. "Make F# untyped") then please don't submit it.
Please tick all that apply:
- [x] This is not a breaking change to the F# language design
- [ ] I would be willing to help implement and/or test this
- [ ] I or my company would be willing to help crowdfund F# Software Foundation members to work on this
Things to consider
- Would we allow literals
"abc"<FilePath> - Would we allow literals
@"abc"<FilePath> - Would we allow literals
"""abc..."""<FilePath> - Would "+" be supported, operating on two strings with the same annotation
- Would we allow null literals
null<FilePath> - Will this lead to requests to do the same thing for other basic types like DateTime or GUID or whatever? Where is this ultimately heading?
Discuss please
As long as we're talking about extending UoM, I've always wondered why unsigned integers aren't supported? And would it be much work to include support for them?
Would it be possible to generalize this and allow to apply uom to all types?
Why not System.IO.Stream<Measure>?
How would this work with types that are already generic?
List<int><Measure>? looks funny, and may be ambiguous with the less than operator?
Just thinking out loud ...
Wouldn't haskell's newtype be a more explicit way to handle all the above (although without some of the properties we have with measures which can combine when using calculations)?
I think the underlying concern is to get type safety and identical runtime representation (in the example, it is still a string, without any extra overhead) and newtype does that AFAIU.
I think the underlying concern is to get type safety and identical runtime representation (in the example, it is still a string, without any extra overhead) and newtype does that AFAIU.
A single-field struct essentially does the same thing. (Though there are some differences w.r.t. euality, hashing, comparison)
I think the rationale is really programmer conceptual efficiency - no need to name a new type, nor inject in/out of a new type, just an annotation which is mentally understood as cost-free.
This is one of the features I've wanted to reach for over and over again, and tried without being able to find a satisfactory solution. Phantom types, struct wrappers with helper operators, using generics as tags, tuple flags, it all quickly becomes cumbersome and more trouble than the extra compile time constraint is worth.
Whereas Units of measure do everything that I am looking for in a sensible and reasonable way, but we can't use them for that. If they could just be used just as tags and flags across the vast majority of types it'd be a huge win. I don't think they should be allowed on null literals, and possibly not on any type that satisfies the null constraint.
only floating point types, signed integral types, and decimal types support dimensioned quantities.
Perhaps types can enable the support of dimensioned measures in the same way they can support particular constructs by implementing generic one
[<Struct>]
type Vector<'a, [<Measure>] 'u,[<Measure>] 'z> = {
A : struct('a * int<'u>)
B : struct('a * int<'u>)
C : struct('a * int<'u>)
} with
static member One = {
A = struct(Unchecked.defaultof<'a> , 1<_>)
B = struct(Unchecked.defaultof<'a> , 1<_>)
C = struct(Unchecked.defaultof<'a> , 1<_>)
}
static member (+) (v1:Vector<_,_,_>,v2:Vector<_,_,_>) =
let struct(a1,b1),struct(c1,d1) = v1.A,v2.A
let struct(a2,b2),struct(c2,d2) = v1.B,v2.B
let struct(a3,b3),struct(c3,d3) = v1.C,v2.C
{ A = struct(a1+b1,c1+d1)
B = struct(a2+b2,c2+d2)
C = struct(a3+b3,c3+d3)
}
static member (*) (v1:Vector<_,_,_>,v2:Vector<_,_,_>) =
let struct(a1,b1),struct(c1,d1) = v1.A,v2.A
let struct(a2,b2),struct(c2,d2) = v1.B,v2.B
let struct(a3,b3),struct(c3,d3) = v1.C,v2.C
{ A = struct(a1+b1,c1*d1)
B = struct(a2*b2,c2*d2)
C = struct(a3*b3,c3*d3)
}
static member (..) (v1:Vector<_,_,_>,v2:Vector<_,_,_>) =
let rec expand x = seq {
if x >= v2 then () else yield x
yield! expand (x + Vector<_,_,_>.One)
} expand v1
in this case 'z would be the dimensioned measure, but that's probably not a great convention....
- Would we allow literals "abc"<FilePath> - Would we allow literals @"abc"<FilePath> - Would we allow literals """abc..."""<FilePath>
This would be the most natural/consistent way of constructing new tagged values. In my view, it should not differ from their numeric counterparts so the learning overhead for new users doesn't increase.
- Would "+" be supported, operating on two strings with the same annotation
I'd say for string this should definitely "just work". It would obviously be a different story if tagging would generally be allowed for any type. Would it be possible for values of, say, type Guid<OrderId> to keep accessibility to all its methods/properties?
- Would we allow null literals null<FilePath> - Will this lead to requests to do the same thing for other basic types like DateTime or GUID or whatever? Where is this ultimately heading?
I think that it would ultimately be good to be able to use this feature on all types in the spirit of @smoothdeveloper 's suggestion that it could be seen as an (IMO improved) version of newtype in Haskell. If using the tagged values would look and feel exactly like their untagged siblings (i.e. without the need to explicitly wrap/unwrap them as is the case with newtype for members/properties) it would turn out to be a very expressive system.
My 2¢ :)
I like this idea a lot. The only drawback I can see to it is a slight increase in conceptual difficulty: units of measure, as a term, makes perfect sense when applied to numeric values, since that concept comes straight from science, where 10m / 2s = 5m/s and so on. But once we start extending the concept to types like string<FilePath> and string<UserName>, suddenly the term "units of measure" no longer fits conceptually. So perhaps instead of extending the [<Measure>] attribute to strings and other types, it might make more sense to choose a different attribute name than [<Measure>]. I don't have any really good suggestions, but maybe something like [<DataTag>] or [<TypeTag>] or [<TypeModifier>]... but none of those really appeal to me. Something with the word "tag" in the name, perhaps?
At any rate, I think we should discuss whether to stick with the name [<Measure>] or pick a new name. Advantage of sticking with [<Measure>], of course, is that it's already there in the F# specification, and there's no chance of overlap with attribute names that existing code has chosen to use. Are there any new attribute names ("SomethingTag", etc) that would outweigh that advantage by so much that it's worth switching? (While maintaining [<Measure>] as an attribute for numeric values, I mean -- I'm NOT suggesting getting rid of it.)
Also, the question of operators comes up. For units of measure, it makes perfect sense to divide or multiply the units according to standard scientific rules, so that 10.0<m> / 2.0<s> = 5.0<m/s>. But for data tags ("units of measure") on other data types, it may not make sense to do that. The question won't come up for strings since the * and / operators aren't meaningful on strings. But what if someone has, say, a matrix data type and wants to use "units of measure" to distinguish between different types of matrices:
[<Measure>] type Affine
[<Measure>] type LinearAlgebra
open My.Matrix.Math.Library
// My library defines a matrix-multiplication operator * and a Matrix type
let m1 = Matrix(...)<Affine>
let m2 = Matrix(...)<Affine>
let m3 = m1 * m2
Would it make sense for m3 to acquire the created-on-the-fly unit of measure Affine^2? Or in this case, would it be better for m3 to have the unit of measure Affine, and not keep the "unit squared" behavior carried over from scientific units of measure?
In order to allow the latter behavior — where multiplying two Affine matrices produces another Affine matrix instead of an Affine^2 thing — I think we'd need to specify a new attribute, separate from (but similar to) [<Measure>]. The [<Measure>] attribute will keep the existing unit-of-measure semantics taken from science, where dividing m by s produces an m/s measure, but the new attribute would NOT use the same semantics, and would instead let an Affine mutliplied by an Affine produce another Affine value.
The desired semantics for this new feature seems very ill-defined to me. For the FilePath example, you'd want to ensure that the set of operations that will typecheck on such a string<FilePath> will maintain the invariant that the string actually represents a syntactically valid file path. This means stuff like concatenation of path strings should be restricted.
The only way I know of to achieve such guarantees is to define a new data type with new operations that maintain whatever invariants need to be maintained. Yes it requires the definition of a new type and new functions, but how else can you specify how you want your "tagged type" to behave? With units of measure, the semantics are obvious, but that's not true when you start to generalise things.
It feels to me like rather than abuse UoM, a separate newtype feature like haskell would be more useful, essentially the compiler doing what Scott describes in Domain Modelling vs. Performance. Pattern matching feels like it might be more complex tho.
This would be nice.
There are lot of strings expecting correct ordering or formatting (and numbers with trailing zeros), e.g.:
let ``some details`` =
"FI1410093000123458"<IBAN>,
"FI0000000000012345"<InvoiceNumber>,
"00447708812345"<Phone>,
"00127708812345"<OrderId>,
I don't know how technically this type erase works so I might say something utterly stupid in which case sorry :)
Would it be possible to revert the problem by flagging somehow a single DU to have some erase at runtime? For example type [<DataTag>] FilePath = FilePath of string which would be considered as a regular single DU from F# point of view but as a simple string at runtime.
When I started to develop with F#, I really wished this feature was available (mainly for string) but I have noticed that it was mainly a laziness shortcut I wanted to take. As pointed out by @nmsmith, although the type annotation would prevent to assign any string to a let's say FilePath it doesn't offer all the type safety of a record/DU. Besides, I have noticed that the single DU tend to actually provide a better way to complexify a type over time.