RFCs icon indicating copy to clipboard operation
RFCs copied to clipboard

Track the purity of expressions to avoid the value restriction

Open garrigue opened this issue 6 years ago • 31 comments

The value restriction, even in its relaxed form, makes us lose polymorphism even in cases where all functions are pure. This is especially galling for combinator libraries, where code is built through function applications.

This proposal tries to avoid it in those situations, by tracking the purity of expressions in a very light weight way. Typability is a subset of SML90, which ensures the soundness of this approach.

The proposal is in PDF form.

An experimental implementation is available as a branch purity-attribute. He are some examples.

garrigue avatar Nov 06 '19 15:11 garrigue

I haven't read this in detail yet, but I think it might be interesting to compare it to:

https://arxiv.org/abs/1907.07283

lpw25 avatar Nov 06 '19 16:11 lpw25

Actually, this is the other way round: this PR tracks the part or purity relevant to polymorphism, in order to get more polymorphism. The types are unchanged. Purity is a bit of a misnomer, since we are only interested in impurities that could make polymorphism unsound.

garrigue avatar Nov 06 '19 17:11 garrigue

Planned extensions after discussion with @diremy :

  • allow purity annotations for recursive polymorphism and polymorphic record fields
  • maybe also for polymorphic methods
  • there should be a way to see ground types as pure, implicit or explicit

garrigue avatar Nov 07 '19 14:11 garrigue

Added a few features to the experimental implementation

  • automatically add pure to ground types in interfaces, and ignore it when subtyping
  • [@@@pure] and [@@@impure] in signature, to allow switching mode for all definitions (e.g. all functions in List are pure)
  • non principal hack: consider the result of an application as pure if it is a ground type

This allows to compile the furl combinator library, with purity annotations to allow sufficient polymorphism in the camlidea.ml test for instance: #17

garrigue avatar Nov 08 '19 16:11 garrigue

automatically add pure to ground types in interfaces, and ignore it when subtyping

IIUC this suggests that you might want to find a different word than "pure" for these annotation, since you'll be putting it on functions that are not pure in any of the traditional senses.

lpw25 avatar Nov 08 '19 17:11 lpw25

Yes, I've been looking for a better name for a while, but couldn't find one yet. Something like [@@unrestricted_poly] would be more correct, but not very palatable. I accept proposals :) On the other hand, while what is inferred is not purity, the intuition used when one annotates an interface is essentially that one. I.e. while the goal is to be able to type truly pure cases, in practice it seems easier to be more lenient during inference.

garrigue avatar Nov 09 '19 10:11 garrigue

All I can think of so far is [@@value] since it is a bit like you are marking something as a value from the point of view of the value restriction (although only a bit like that).

lpw25 avatar Nov 09 '19 16:11 lpw25

Thanks for the suggestion. Another idea is [@@logical], to emphasize that quantifiers follow the laws of logic. But we now also need a negation to switch off signature wide qualifiers. Or just [@@@logical false].

garrigue avatar Nov 11 '19 21:11 garrigue

On 11/11/2019 22:39, Jacques Garrigue wrote:

Thanks for the suggestion. Another idea is |[@@logical]|, to emphasize that quantifiers follow the laws of logic. But we now also need a negation to switch off signature wide qualifiers. Or just |[@@@logical false]|.

Hum, [@@logical] seems a little pompous here and does not carry a good intuition.

I still prefer [@@pure] or [@@value]. I think [@@valueform] would be better than [@@value], but a bit verbose.

As Jacques said, his proposal is actually an abstraction (i.e. approximation) of the weak vs stong type variables of SML. Hence, the good intuition when an indentifier is [@pure], e.g.

    val id : 'a -> 'a [@@pure]

is that the implicit "all 'a." in front of "'a -> 'a" is a strong quantifier while when all unannotated OCaml quantifiers have to be treated as weak. So perhaps,

    val id : 'a -> 'a [@@strong]

would even be better: closer to the underlying intuition, and does not assert purity when it's only talking about a certain notion of purity.

Didier

diremy avatar Nov 12 '19 08:11 diremy

Current state (2019-11-12)

As mentioned above, the current branch allows one to play with the Furl library, avoiding clumsy thunks. It extends the proposal with:

  • [@@@pure] and [@@@impure] to annotate whole signatures (i.e. all value declarations after [@@@pure] are seen as pure, until the next [@@@impure]). Used it for List (everything is pure). This seems essential for light annotations of libraries.
  • values with ground types in signatures are pure (independently of the implementation)
  • applications with a ground result type are assumed pure (non principal)
  • polymorphic recursive definitions are checked twice: first assume purity, and recheck as impure if it fails

Discussion

  • The annotation approach seems to work well, allowing easy upgrading of signatures
  • The need for a special handling of ground types makes me think that the propagation approach is only a hack: having strong type variables in the type-checker would be both more powerful and principal, even if we choose to show them through these annotations
  • On the other hand this approach is simple enough, and upward-compatible
  • The problem with polymorphic recursive definitions stays with strong type variables, except if we require explicit purity annotations on the definition.

garrigue avatar Nov 12 '19 15:11 garrigue

Discussion
  • The need for a special handling of ground types makes me think that the propagation approach is only a hack: having strong type variables in the type-checker would be both more powerful and principal, even if we choose to show them through these annotations

I support the strong type variables approach, which is closer to what's happening underneath. It might still be lightweight---as long as there are just weak or strong and do not use different levels of weakness as in SML-nj.

  • The problem with polymorphic recursive definitions stays with strong type variables, except if we require explicit purity annotations on the definition.

With strong type variables, a polymorphic recursive definition should be annotated so the purity annotation would naturally come with the type annotation: since polymorphic variables must be listed, they should also come with their kind information (weak/strong) whatever the syntax.

Didier

diremy avatar Nov 13 '19 07:11 diremy

I didn't look at the proposal until today. A few comments.

I certainly agree with Didier that it is odd/unpleasant to classify polymorphic fields as impure. Taking a step back, the fine-grained mode would be to annotate variable-introduction sites (in types, universal quantifiers; does it make sense to annotate existential variables in GADTs?), so the coarse-grained approach would be to annotate not really a type, but a telescope / set of variable-introducing quantifiers. In the HM fragment, there is exactly one such telescope per type, but in presence of explicit polymorphism in recursive definitions or polymorphic fields we should allow to annotate all places that allow universal quantification (implicit or explicit). So I should be able to write { mkref : 'a . 'a -> 'a ref [@@pure] } (and then be unable to populate this type in the way I expected), or let rec id : 'a . 'a -> 'a [@@pure].

I'm not personally convinced that we need to restrict ourselves to the coarse-grained approach (could this be a mistake we would regret in the future, similarly to the decision to not name row variables?). It would be nice to be able to annotate specific variables, and we have an excellent default, sufficient in most cases, that annotating a type scheme means annotating all quantified variables.

As @lpw25 I strongly believe that being able to annotate print_endline or raise with @pure highlights that this is a terrible naming choice that must be changed. It's worth spending some effort to find a good name, because we cannot just introduce such a terrible naming mistake in the language.

  • strong is less terrible than pure, but it's not very satisfying either; it doesn't mean anything to most people so at least it doesn't suggest a completely different meaning.
  • we could also speak of safe and unsafe type variables and quantifications; this gives a bit more intuition than weak/strong but I don't think that users would assume that they know (incorrectly) what it means.
  • As far as references are concerned, the distinctive property of strong/safe type variables is that values at their types cannot "escape" to the ambient execution context through the program execution (by being stored in the mutable heap, or passed to a possibly-unsafe function), except as part of the result behavior matching the parts of the return type that mention this variable. This idea of "not escaping" may be useful also at the value level (a function argument or a temporary variable that does not escape the dynamic extent of a function), and I sort of idly wondered if a coherent naming would be possible.

There are known bad-interactions between non-delimited continuations and generalization (actually the best document I found to summarize the state of the art in the ML design world was Xavier Leroy's 1992 PhD thesis manuscript... in French). I suppose that such complications can be avoided by marking control-operator types with weak/unsafe/escaping/leaking variables similarly (this was the SML solution), but it's not clear enough to me how the "non escape" intuition can be extended to that setting. I would like to feel confident that however the restriction is implemented is not going to become inadequate in presence of a richer effect system (possibly relying on delimited continuations); we have results (see No value restriction is needed for algebraic effects and handlers, Kammar and Pretnar, 2016) showing that simple effect handlers are not generative enough to run into polymorphism issues, but still I would welcome a coherent intuition for the whole picture. Semantically what does the notion of strong/safe/local variables stands for?

gasche avatar Nov 20 '19 14:11 gasche

  • As far as references are concerned, the distinctive property of strong/safe type variables is that values at their types cannot "escape" to the ambient execution context through the program execution (by being stored in the mutable heap, or passed to a possibly-unsafe function), except as part of the result behavior matching the parts of the return type that mention this variable. This idea of "not escaping" may be useful also at the value level (a function argument or a temporary variable that does not escape the dynamic extent of a function), and I sort of idly wondered if a coherent naming would be possible.

I think your description is unnecessarily complicated. The intuition, which is quite closed to the argument used in the proof is simpler:

When an indentier/expression is assigned a type scheme, its weakly quantified type variables may be instantiated by a type that can be used as the type of a subexpression that will end up in the store when the expression will be applied and evaluated (which cannot happen for its strongly quantified type variables).

Otherwise, I don't think safe/unsafe is much better than pure/impure unless it is preceded by the idea of allocation. The intuition is not that of reading or writing into the store, just that of allocating a piece of store. (A polymorphic funtion that may write in the store is pefectly fine!)

Hence, @@noalloc/@@alloc would be much closer to the truth and not misleading, but this does not sound so nice... Perhaps a little nicer is @@allocfree sounds nicer than @@noalloc---and more positive for something that should be prefered to than @@alloc. An alternative is to replace "alloc" by "leak".

To sum up:

    pure            impure
    
    safe            unsafe
    
    noalloc         alloc
    allocfree       allocable
    
    leakfree        leakable
    unleakable      leakable

Didier

diremy avatar Nov 20 '19 15:11 diremy

I prefer "safe" over "pure" because while users understand that there are many distinct notions of "safety" (so they will wait to see which one you mean before assuming a meaning), most people work with a very restricted set of meanings for "pure" in the context of programming languages: an expression is "pure" if it does not perform any side-effect (then we argue about what side-effects mean).

Indeed I find "leak" more tempting (it is not misleading like "pure" is and it carries some intuition, unlike "safe"), but "noleak" or "unleakable" are a bit ugly; I think that "safe" or "stable" or "local" may work as the opposite of "leak".

Your description is simpler, but:

  • It does not completely suffice to explain why ref [] is safe, fun x -> ref [] is safe, but thunk (ref []) is not.
  • Noe of those descriptions gives an intuition that is generic enough to be extended to the issue with first-class continuations.

The Harper&Lillibridge example for continuations, as repeated in Xavier's thesis, is as follows:

let later : 'a . ('a -> 'a) * (('a -> 'a) -> unit) =
  call/cc (fun k ->
    (λx. x),
    (λf. throw k (f, λx. ())
  )
in
print_string(first(later)("Hello !"));
second(later)(λx. x + 1)

This example also relies on the fact that two incompatible instantiations of the declaration are made in the program and interact in bad ways, but I don't know how to formulate an intuition for it in terms of leaking into a "store". Maybe we could say that the variable leaks into the continuation environment, binding continuation names to (typed) continuations? (This environment must be global to the program to capture an undelimited continuation.)

gasche avatar Nov 20 '19 16:11 gasche

How about nonexpansive? That's the term used in the definition of SML, and it's not as overloaded as "safe", "pure", etc. (The intuition is that these are the expressions whose evaluation does not expand the store)

stedolan avatar Nov 20 '19 16:11 stedolan

Your description is simpler, but:

I should have said somethinge even simpler:

  • A type variable is weak when it is used in the type if a subexpression whose evaluation may allocate a piece of store containing a value of that type.

  • Besides, a weak type variable may only be generalized in front of a value form.

  • It does not completely suffice to explain why |ref []| is safe, |fun x -> ref []| is safe, but |thunk (ref [])| is not.

I think it does. Notice that weak/strong applies to type variables, not to expressions.

BEWARE! ref [] is unsafe (in your sense):

ref [] : '_a list ref

Hence, '_a which is weak cannot be generalized. Why so? because a value of type 'a may enter the store during the evaluation of ref [].

Did you mean (fun x -> []) () ?

  • (fun x -> []) () : 'a. 'a list

    Here, nothing can enter the store, so 'a is strong and can be generalized.

If you meant Jacques' extended value restriction, see below.

Continuing with your examples:

  • thunk (ref []) : '_a list ref

    same argument as for ref []

  • fun x -> ref [] : '_a . '_a -> '_a list ref

    '_a is weak when typing "ref []" as it may enter the store. it stays weak in the type of fun x -> ref [], indeed. However, as this is a value, it can be generalized (as weak variable).

    Hence it you instantiate it (e.g. '_b -> '_b) it has to be with weak type variables as well.

  • (fun x -> x), ref [] : 'a. ('a -> 'a) * '_b list ref

    'a is strong, '_b is weak, so only 'a can be generalized.


Jacques' extended value restriction?

    let _ = ref [] in []

One answer is that we may not need it anymore because useful cases such as pseudo constructors would become pure and generalizable.

This is still fine and can be explained with my criteria.

    let x = ref [] in !x

I argue that x has type "bot list" and !x has type bot list which is coerced into 'a list. Hence 'a is strong as it has not been put in the store.

  • Noe of those descriptions gives an intuition that is generic enough to be extended to the issue with first-class continuations.

Right, I haven't considered continuations.

But I asume that replacing "may allocate a piece of store" should be completed with "or may be simultaneously input to and output of a continuation". But it needs to be check---I don't have the details in mind.

Didier

The Harper&Lillibridge example for continuations, as repeated in Xavier's thesis, is as follows:

let later : 'a . ('a -> 'a) * (('a -> 'a) -> unit) = call/cc (fun k -> (λx. x), (λf. throw k (f, λx. ()) ) in print_string(first(later)("Hello !")); second(later)(λx. x + 1)

This example also relies on the fact that two incompatible instantiations of the declaration are made in the program and interact in bad ways, but I don't know how to formulate an intuition for it in terms of leaking into a "store". Maybe we could say that the variable leaks into the continuation environment, binding continuation names to (typed) continuations? (This environment must be global to the program to capture an undelimited continuation.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ocaml/RFCs/pull/5?email_source=notifications&email_token=AAKLQ2UO6FVNRT7DWHMZOPDQUVOD7A5CNFSM4JJWRUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEESQWQQ#issuecomment-556075842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKLQ2WSWQFTTI4QEMNB2RLQUVOD7ANCNFSM4JJWRUHA.

diremy avatar Nov 20 '19 17:11 diremy

Yes, indeed, I thought about nonexpansive as well, but it's a bit longer with a negative connotation and (I think, but I am not sure) this originally refered to expressions that are referentially transparent, and in particular that cannot produce side effects.

Perhaps, I am wrong, but I've check and found occurrences where "being expansive" is defined as or associated with "being able to produce side effects".

This is not the correct intuition since assignment can produce side effects but can be safely typed with strong variables:

    ref  :  '_a. '_a -> '_a ref 
    (!)  :  'a.  'a ref -> 'a
    (:=) :  'a.  'a ref -> 'a -> unit

We may still use the word nonexpansive/expansive with our definition if you don't find it too long, and ...

It also is bothers me (just a little) that the positive thing is expressed by a negative statement such as non expansive (or no alloc, un leakable, etc.)

Didier

On 20/11/2019 17:43, Stephen Dolan wrote:

How about |nonexpansive|? That's the term used in the definition of SML, and it's not as overloaded as "safe", "pure", etc. (The intuition is that these are the expressions whose evaluation does not expand the store)

diremy avatar Nov 20 '19 17:11 diremy

It also is bothers me (just a little) that the positive thing is expressed by a negative statement such as non expansive (or no alloc, un leakable, etc.)

It is also my feeling. The point being that those variables behave normally (logically, as I was trying to express with my logical), contrary to the restricted variables we have currently. Since I had no other good idea, I looked at a thesaurus, and could find a few:

  • untainted (again with un-)
  • vestal (Roman culture)
  • proper (kind of like this one, as it is positive and the meaning is wide enough)

There are of course many more, but they have more connotations.

garrigue avatar Nov 21 '19 07:11 garrigue

A typical user of OCaml interacts with the language through the abstraction of the OCaml type checker, which do not leak such details as expansive or nonexpansive or pure, or anything like this. In fact, it is not even referencing value restriction. What the type checker is saying is that the type variable could not be generalized. Using the existing terminology we can express the notion of a generalizable using a, sort of, imperative generalize.

val some : 'a -> 'a option [@@generalize 'a]

As example shows, it will also leave the space for more fine-granular annotations, when not all type variables in a scheme require or could be generalized. I believe this naming will capture nicely the typical interaction between a language user and the typechecker, without introducing any new concepts.

ivg avatar Nov 25 '19 17:11 ivg

I like @ivg's proposal.

gasche avatar Nov 25 '19 20:11 gasche

Why not---that may be the best of the proposals so far.

"generalize 'a" itself is not quite the intuition, since 'a is generalized anyway as it is the type of a value, but the idea that such an 'a should hereditarily be generalized in all contexts, as long as it is not unified with a type and in a context that say otherwise, is the idea.

Do we also need a keyword for the negation, to say "do not generalize 'a", e.g. if the context says that the default is "generalize"? What should it be then? "do-not-generalize" or "generalize=false"? Do we have other examples of annotations that come in pairs of positive/negative flags? Then, we should have a regular pattern for the naming the negative flag.

Didier

On 25/11/2019 18:51, Ivan Gotovchits wrote:

A typical user of OCaml interacts with the language through the abstraction of the OCaml type checker, which do not leak such details as expansive or nonexpansive or pure, or anything like this. In fact, it is not even referencing value restriction. What the type checker is saying is that the type variable could not be generalized. Using the existing terminology we can express the notion of a generalizable using a, sort of, imperative |generalize|.

val some : 'a -> 'a option [@@generalize 'a]

As example shows, it will also leave the space for more fine-granular annotations. I believe this naming will capture nicely the typical interaction between a language user and the typechecker, without introducing any new concepts.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ocaml/RFCs/pull/5?email_source=notifications&email_token=AAKLQ2SIKAD4Z4SI7W5UTRDQVQGALA5CNFSM4JJWRUHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDHVOI#issuecomment-558267065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKLQ2W2B7732SKMOOQF63LQVQGALANCNFSM4JJWRUHA.

diremy avatar Nov 26 '19 09:11 diremy

We could use [@@leaking 'a] as a negative form. (Note: another syntactic approach would be to attach attributes to occurrences of types within the declaration, 'a[@generalize] -> 'b[@leaking] -> 'b ref).

My understanding of the use-case is that leaking would be the default mode (the most common where we allow everything, morally we give no details on the intensional aspects of the (impure) function), and people would use [@@generalize] explicitly on parts of the API they want to expose as "pure constructors". Is that the right idea? In this case, having a good name for the negative counterpart is less important. Or will people actually want to use generalization by default, or commonly on higher-order functions (such as List.map), so that we need both forms to be syntactically convenient?

(Have we considered using 'a and ´ _a, as in SML?)

gasche avatar Nov 26 '19 10:11 gasche

Indeed, the problem with SML was that it was considering the problem from the wrong side. Being generalized is a plus, but I don't think there should be an opposite statement. For signature wide definitions, it seems sufficient to provide

[@@@generalize true]
[@@@generalize false]

as switches.

garrigue avatar Nov 26 '19 10:11 garrigue

(I don't really agree that we should not provide a negative statement. When there are two options, I think that we should always have a syntax for both, even if one is the implicit default. This avoids blind spots such as nonrec, and it allows people to explicitly express their intent when the (default) choice matters.)

Personally I find [@@@generalize true] rather unpleasant (it makes interface reading more context-dependent), but having [@@generalize] mean "generalize all variables in this declaration" would be sensible.

gasche avatar Nov 26 '19 10:11 gasche

So, if I'm not mistaken, we've now had 3 weeks of serious discussion about the name of the attribute.

However, one thing I cannot really see (neither in the PDF, nor on this page) is a discussion about drawbacks and alternatives (cf. https://github.com/ocaml/RFCs#making-an-rfc ). @gasche previously mentioned that there might be interaction with an effect system (which I believe we're still hoping to add to the language), but no one seems to have reacted to that. Personally, I'm also curious about that, and whether all of this is still going to be useful if we ever track effects?

trefis avatar Nov 26 '19 11:11 trefis

Let me try to express my opinion in the classical review style, let's see if it works :)

Review A

Overall Merit

Weak Accept.

I'm fine with this change and won't object to its inclusion in the language. However, I believe that this change touches the fundamental notions of the language and affects the user experience and requires more careful consideration. Otherwise, it will create yet another dark corner in the language, that may avert new and existing users.

Submission Summary

The proposal implements a must analysis of a program property to enable generalization of type variables in expressions that hold this property. The property is defined structurally by a set of rules and over-approximates the strength of an expression. Informally, an expression loses its strength when it accesses a mutable or polymorphic field of a record and the rules define how the impurity (taint) gained from this access is propagated. If an expression is not strong then we can say that during the evaluation of this expression it may create a storage for a value with a type that escapes the scope of this expression. Therefore, generalization of all variables is conservatively disallowed in the type scheme of this expression.

Submission strengths

The proposed analysis lets type more programs without compromising soundness. And is especially useful in cases when functional (smart) constructors are used instead of plain data constructors. It is a relatively lightweight change that doesn't significantly increase the support burden.

Submission weaknesses

The interaction with variance is not clear

It looks like that this proposal is parallel with the existing mechanism of relaxing value restriction based on variances of type variables. I have the impression that it extends the variance analysis from data constructors to functional constructors. Could they be unified, at least on the conceptual level?

The property is not familiar to casual programmers

First of all, the problem with finding a good name for the property indicates that the property is not well-known or maybe still waiting for a proper mathematical abstraction to be fit in. So far, it is defined purely structurally as a boolean lattice and a set of propagation rules that define the transfer function. Explaining this property to OCaml programmers would be painful as the definition requires the knowledge of the internals of the type checker and in general, will require us to extend the theory of the compiler which is exposed to the end-users.

The change complicates the typing rules exposed to the programmers

The proposal extends the typing rules that a casual programmer has to learn and thus increases the cognition burden and makes OCaml a little bit more complex language. Of course, the rules are already extended by the attributes semantics, but this extension is probably the first serious step that affects the inferred types of expressions, i.e., it allows attributes to affect the typing rules.

I think that this is an important change and since this is one of the first moves in this direction (and it looks like that this direction will be pursued no matter the objections) we have to think carefully and to frame it in some repeatable pattern. Maybe we should develop a common framework (at least from the user perspective) for the annotated type system (as described in Type and Effect Systems), so that annotations like this could be attached to typing schemes, e.g.,

val return : 'a -> 'a t [@@annotated no(storages)]

The change may clash with the upcoming effects

The annotations from the previous paragraph could be captured with the notions of effects or, at least, have a lot of intersections with effects. It could be a good idea to reach out to the designers of the upcoming effect type system to sync the opinions. As [1] shows, effects could be framed into the annotated type system, though I'm not sure which is more general -- annotations or effects. E.g., this also makes sense:

val return : 'a -> 'a t [@@effects no(stores 'a)]

Suggestions

If the annotated type system is a too far going goal, an alternative approach could be investigated with a simpler notion of purity (at least as exposed to the end-user). From the examples, I can conclude that the main use-case is to prove to the compiler that a functional constructor is behaving as a covariant data constructor, e.g., that some x is the same as Some x modulo non-observable effects. This notion could be exposed to the end-users using the [@@constructor] attribute, which could be used in signatures to expose that a given smart constructor behaves as a data constructor. The attribute wouldn't be a part of the type and will basically behave as the variance annotation.

ivg avatar Nov 26 '19 16:11 ivg

The original rules were unsound (they were actually unintentionally stronger than SML 90). I just fixed that, and the implementation too. The implementation now relies on a predicate is_pure_exp working on the typed tree (in a way similar to is_nonexpansive), so that it can exploit the types to regain strength when an impure identifier or a mutable field is used at a ground type.

garrigue avatar Jan 15 '20 14:01 garrigue

We discussed this briefly again, and a central question seems to be how the proposed feature would interact with effect typing. @lpw25 should write an RFC on effect typing (when he's done upstreaming his short-paths implementation), but we can already ask: how would this feature fit in a language with row-based effect inference, such as Koka ?

gasche avatar Jul 23 '20 14:07 gasche

for another perspective, @ivg : ' imperative generalize' sounds like a sudo type instruction to the typer. It connots modification of the typing mecanics which spooks soundness guarantees out of me.

To stick with existing typing annotation semantics it could be put in noun form general. Indeed the typer(solver) sees type equations. Users enter partial solutions optionally as constants. We just help it refine a domain.

I was surprised to read this error the first time, first day :

Error: The type of this expression, '_weak1 -> '_weak1,
       contains type variables that cannot be generalized

and then :

  • Why can't you generalize my trivial use case ?

This is very pedagogically explained in the ocaml documentation :+1:, many kudos there.

I naively thought to myself, the typer types type variables in the most general way. Then it gives me the least general solution, a monomorphic type. To cap it all, I can manually/mentally check there is a trivial polymorphic typing to it.

The reason is that we all implicitly take OCaml to be functional+espilon where epsilon is the perceived addition of mutation to the language. The decision problem is hard so the typer can't decide and conservatively weakens the 'as to '_as.

I think the problem is badly errored out/presented due to an unclear message. A better message may be « I monomorphized that '_a t but please verify whether it is in the espilon branch of the language or not, I can't tell... ». This error self-contains a complete union of reasons and hints to paranoid value restrictions.

I get an unsettling vibe with general because it reads a bit like general 'a = general general. That looks tautologic and counter intuitive as if the typer were seriously dumb.

I tried these : true, transparent, partial, abstract, tainted, actual, literal, manual, mono, poly, protected, guarded, abstract, concrete, exposed, real, lifted, free, pure, lamda, strict, to no avail. Many fail for assuming a tacit « natural » way of programming.

I believe the naming should capture the separation of the language with respect to paradigm. That is what is tracked. Therefore,

What do you think of functional ? It partitions the two aspect of OCaml and conveys disambiguation semantics to help the typer decide a branch. Also supersedes loaded pure,strict... The opposite could be has_imperative. Another is higher with conservative, mixed, full, dual or classical for defaults. I quite like it. It also fits the use case @garrigue introduced. Also relax/paranoid. Suits terminology and documentation.

To summarize, as a user, I am convinced functional or higher carries the intent I mean when facing the typer. And better resolves the canonical choice many make at OCaml programming.

milvi avatar Sep 10 '20 00:09 milvi

cc @goldfirere: I wonder if you would be interested in co-shepherding this RFC.

gasche avatar Feb 06 '23 09:02 gasche