Proposal: "Closed" type hierarchies

Open gafter opened this issue 7 years ago • 32 comments

@agocke commented on Mon Feb 15 2016

Background

This proposal is substantially similar to https://github.com/dotnet/roslyn/issues/188, but it focuses more on integrating with existing C# object structures and less on adding new "ADT"-style objects. The goal, however, is still very similar. By adding closed type hierarchies the compiler can be guaranteed that all the types its sees are the only subclasses of the target type. This can integrate with features like pattern matching to provide a warning when not all possible types are checked in a pattern match.

Problem

Providing an ADT-like pattern is helpful for the pattern matching case but it unfortunately doesn't integrate well with current object-oriented patterns written in C#. For example, the Roslyn TypeSymbol type could be useful to be matched on, but the design as proposed in dotnet/roslyn#188 would require rewriting or significantly modifying the existing type structure meet that goal. It would be better if, instead, the existing object hierarchy could be easily marked as closed, meaning that all inheriting classes must be in the same compilation unit as the definition.

Solution

The cleanest solution would simply be to add support for a new modifier, closed, that would require all inheriting classes to be in the same compilation unit as the declaration and to be themselves marked as either closed or sealed. Interestingly, this can be easily extended for interfaces as well, where closed would imply that the implementing type would have to exist in the same compilation unit. This would encompass the previously proposed InternalImplementationOnly attribute without adding any new concepts to the language.

The main issue with this simple addition is that it would only be enforced in the compiler, probably by the presence of a custom attribute in metadata. This would mean that any language which doesn't understand closed, or even any previous version of the C# language, would not be under any constraints in implementing these types. Unfortunately, this causes significant issues with any language using closed as it would generally be seen to be a guarantee of the possible subtypes or implementations, when in fact any non-participating language would violate those assumptions.

However, attempting to encode the objects in such a way as to use the .NET object system to restrict usage has an unfortunate set of problems that I'll detail here.

Issues with using .NET to encode `closed`

The first barrier is that .NET provides no method of restricting interface implementation, so applying closed to interfaces is impossible.

The second barrier is that .NET only provides a way of restricting inheritance of objects by limiting the visibility of a constructor. If an object is visible outside its declaring assembly, all the constructs must be marked internal or stricter in order to prevent inheritance outside the declaring compilation. If InternalsVisibleTo is in play, things become even more complicated -- the only way to prevent inheritance is to mark all constructors private and require that all classes inheriting from the closed class be inner classes. Assuming we don't want to reify that encoding into the language, this would require lowering definitions and usages to the appropriate class layout.

Requiring all classes to be sealed or only constructed inside the declaring compilation is a heavy restriction, but that's not the only limit. There's at least one other restriction in the C# type system: classes which do not have public default constructors cannot be used as parameters to a type parameter with the constraint T: new(). If we go down this path, there may be other complications in the same vein.

Now that I've listed the issues with using the type system, I'd like to discuss why not using the type system isn't as bad as it first seems.

Failure scenarios

In this instance, "failure" is defined as a language which supports and understands the closed modifier encountering a type which should have been prohibited by the existence of closed. There are only a few instances where this is the case:

Compilation A defines type A, which is marked as closed. Compilation B, which is written using a language which does not understand closed, creates subtype B, which inherits from type A. This type is then passed back into compilation A via a public API. In this instance, we can see the compilation B is interacting with compilation A as both a consumer and a provider through its public API. The violation of compilation A's public contract is the underlying cause of the failure, regardless of the support or lack thereof from the language. Many public APIs have contracts which cannot be expressed in the type system, like various argument out of range conditions, but this is not often seen as an unredeemable flaw in the language design. In addition, like out of range conditions, closed type violations could be checked at runtime at public API boundaries to provide early feedback and failure for misuse of the API.
Compilation A defines type A, which is marked as closed. Compilation B, which is written using a language which does not understand closed, creates subtype B, which inherits from type A. Compilation C, which supports closed, references compilation A and is referenced by compilation B. Type B is passed from compilation B to compilation C, where it proceeds to fail inside compilation C. Unfortunately, in this case the failure in compilation C is more of a violation of the contract of compilation A rather than compilation C, so efforts to shield against failures in compilation C is difficult and, in a sense, inappropriate.
Other failures are transitive failures of the above, for example a tertiary compilation passing a "bad" value into a compilation, which then passes it through to a failing compilation.

(1) is the case I believe is the least complicated and would be the most common. I also believe it is the case least worth worrying about. In addition to being similar to other misuses of a public API, the failure is caused directly by a compilation creating an invalid value based on a referenced API, then passing that invalid value back into the API. In this case 1) the constructor of the bad value should have been aware of the public contract of the referenced assembly because it is both consuming, augmenting, and utilizing it directly, and 2) the referenced API has the greatest ability to be "defensive" in its consumption.

The case in (2) is a bit more complex, both because the contract may be more implicit and the failing compilation may be less able to protect itself. However, that doesn't mean that the risk is completely unlimited. For one, compilations can limit their public API to not rely on types marked as closed. Also, the situation becomes more pathological as the compilation references and API contracts become more complicated, but they also become rarer. Overall, it seems that this failure mode is worth the significant benefit it provides.

Conclusion

There are no trivial wins here, but it appears that a simple custom-attribute-enforced closed type modifier would provide substantial benefit that may outweigh the downsides. The only thing that seems clear is that attempting to provide a .NET-object guarantee with arbitrary closed hierarchies can get very complicated very quickly and is probably not worth the complexity and limitations it brings.

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-09-26.md#discriminated-unions

@DavidArno commented on Tue Feb 16 2016

"Providing an ADT-like pattern is helpful for the pattern matching case but it unfortunately doesn't integrate well with current object-oriented patterns written in C#"

I'm not convinced that pattern matching will ever fit well with existing OO designs. It's a functional construct and thus surely it would be better to use records and discriminated unions with pattern matching? It's an opportunity to improve existing designs.

@alrz commented on Tue Feb 16 2016

@DavidArno Agreed, but it doesn't mean that you shouldn't be able to use pattern matching with existing language constructs. If that wasn't the case, you would need to do extensive refactoring just to be able to use it in your code. I think the proposal is trying to fill this gap.

@bondsbw commented on Tue Feb 16 2016

So is part of this proposal that all record types implicitly use the closed modifier and its semantics (including how the result is emitted in IL)?

Is it possible to extend this concept to virtual properties and methods?

@agocke commented on Tue Feb 16 2016

@bondsbw Record types could use this proposal or not. I'm not declaring anything either way right now, but I would expect some synergy here.

As for virtual members, theoretically this could be applied individually (presumably without the class being itself closed, because that would just be redundant) but that seems like a strange, possibly bad API design. I'd have to see a real use case that would benefit before applying the modifiers to members.

@agocke commented on Tue Feb 16 2016

/cc @gafter @MadsTorgersen

@bondsbw commented on Tue Feb 16 2016

@agocke Consider this example, where both classes are in the same compilation unit:

public class Worker { 
    public closed virtual bool IsBenefitsEligible {
        get { return false; }
    }
}

public class Employee : Worker { 
    public bool IsFullTime { get; set; }

    public sealed override bool IsBenefitsEligible {
        get { return IsFullTime; }
    }
}

This enforces the semantics that in order to be eligible for benefits, a worker must be a full-time employee. No externally derived classes are allowed to change that.

@agocke commented on Tue Feb 16 2016

@bondsbw Sure, the sealed was useful here, but how was closed useful? I would need a real-life example from a library why it would be useful to restrict overriding of specific virtual methods.

@bondsbw commented on Tue Feb 16 2016

@agocke closed prevents an external class derived from Worker from overriding IsBenefitsEligible, doesn't it?

Without closed, an external class derived from Worker could change whether the worker was considered eligible for benefits (which, for the sake of argument, is not allowed according to our requirements).

@agocke commented on Tue Feb 16 2016

@bondsbw Right, I'm arguing that coming up with that arbitrary requirement isn't good enough. I need to see an actual reason why someone would have such a requirement and it wouldn't be considered bad design.

In the Worker/Employee case I think it would be reasonable to mark Worker as closed, but I don't see why the IsBenefitsEligible member would deserve special casing.

@HaloFour commented on Tue Feb 16 2016

@agocke Regardless of the syntax how is it proposed that this would be enforced? It seems to be something that would require CLR support given that there's really nothing that the compiler could do to prevent a non-sealed publicly create-able class from also being used as a base class.

I like the idea in dotnet/roslyn#8726, although it is more of a CLR proposal than a compiler proposal. Basically it would allow a type to be sealed except for a given white-list of derivable types. Similar to InternalsVisibleToAttribute in that it allows for relaxing the rules but only in that specific circumstance.

@bondsbw commented on Tue Feb 16 2016

@agocke IRS rules state that there are significant tax differences based on whether a worker is considered an employee, or not. Thus, the requirement that two classes exist, workers who are employees and workers who are not. And many companies only allow employees that are considered full time to be eligible for a benefits package. Both of these requirements are quite common in the real world, and in our hypothetical company we are requiring that such classifications be used strictly.

Now, why can't Worker and Employee be closed? Because we still want to allow users of our library to derive those further, like any normal class:

Example 1) A subsidiary creates classes ExemptEmployee vs. NonExemptEmployee, which determine overtime eligibility according to the Fair Labor Standards Act.

Example 2) Your company contracts out transportation duties, and in this situation the classes IntrastateDriver vs. InterstateDriver align best with motor carrier laws.

Example 3) Your business unit is better served with employee classes Developer, Analyst, ProjectManager, QAEngineer, and so on.

@agocke commented on Tue Feb 16 2016

@bondsbw Seems reasonable. I also like the symmetry with the exiting sealed modifier.

@gafter commented on Thu Feb 18 2016

One of the advantages of the compiler knowing that a hierarchy is closed is that the compiler can produce a "kind" field that it can use to switch more efficiently among the various subtypes, as we do extensively in the Roslyn code base. The proposed approach here doesn't seem like it is a very neat way to accomplish that. How would the compiler importing an assembly that used this feature know how to use the kind field?

@alrz commented on Fri Feb 19 2016

@agocke TypeSymbol that you've mentioned is internal already, then couldn't it be considered as a complete pattern with #8455? However, it wouldn't have the efficiency of ADTs (#6739) like what @gafter described. And since closed actually needs CLR support what does it buy you beyond ADTs? For example, with this much flexibility that they could provide.

@agocke commented on Fri Feb 19 2016

@gafter You are correct, this would be no more efficient -- generating a type field would be difficult if not impossible if based solely on this feature. For that use case I think we would require proper ADT support.

@alrz TypeSymbol is internal, but Roslyn has IVT to a number of assemblies. The proposal also clearly explains how to do this without CLR support and the drawbacks of doing so.

Apr 21 '17 22:04 gafter

FYI, Scala has this exact feature with the same exact semantics proposed here. In Scala, the modifier is called sealed which for obvious reasons can't be used in C♯. One use-case is to enable exhaustiveness checks in pattern matching.

May 06 '17 12:05 JoergWMittag

a private protected constructor would make a closed hierarchy,

public abstract class ClosedType {
  private protected ClosedType() {}
}

Nov 26 '17 19:11 alrz

I don't think this is a full alternative to #14208, and #14208 shouldn't have been closed in favour of it - or this is the appropriate place to discuss alternatives to 'closed' types?

May 08 '18 08:05 mcintyre321

or this is the appropriate place to discuss alternatives to 'closed' types?

This is the place to discuss proposals for the C# language and that is a language proposal, so yes.

May 08 '18 09:05 Joe4evr

The Roslyn repo is not the place for language proposals. This repo is the right place. You can create a new issue if you think that is appropriate.

May 08 '18 13:05 gafter

What's a compilation unit? Is it one file, or more than one? Do we define the scope of compilation units?

Jul 15 '18 19:07 AustinBryan

@AustinBryan It's Roslyn terminology for the syntax container. In a C# project, each file is a compilation unit. C# scripting is different.

Jul 15 '18 20:07 jnm2

@AustinBryan: This feature seems to be heavily inspired by Scala's sealed access modifier, which allows inheritance but only within the same "compilation unit". The Scala Language Specification leaves the term deliberately vague, but in the only currently existing implementation of Scala, a compilation unit is a source file.

The reason for leaving this intentionally vague is to allow for non file-based implementations. E.g. Smalltalk-style "Worlds", where the code is just objects like any other objects, and you modify the code the same way you modify any other object, by calling methods on it and asking it to modify itself. Or database-based implementations, where the code is only a projection of the underlying semantic graph (e.g. Intentional Domain Workbench, JetBrains MPS). Or, as @jnm2 pointed out: a REPL, where there is no source file.

Based on my understanding of the proposal at hand, the intended meaning is the same as in Scala. Basically, a "source file" but without limiting itself to only file-based implementations.

Jul 15 '18 21:07 JoergWMittag

I would prefer assembly.

It is a common style to organize one-class-per-file, and limiting the scope of inheriting a closed type to the same file would break that style.

Jul 15 '18 23:07 bondsbw

@bondsbw: I don't believe there exist a concept such as "assembly" in C♯, or does it? I only know of the CLI VES concept of the same name, but that has nothing to do with C♯.

Can you explain what you mean by "assembly" in the context of C♯ (which this discussion is about)?

Jul 16 '18 09:07 JoergWMittag

@JoergWMittag That would be ‘compilation.’

Jul 16 '18 12:07 jnm2

assembly is even a keyword.

Jul 16 '18 12:07 bondsbw

@bondsbw: Can you point me to some information about that keyword? I couldn't find it in the C♯ 6.0 Draft Specification (the very latest officially released one) and neither in the very latest version of the repository for the specification. I examined every single occurrence of the word "assembly" in the entire specification and even the entire repository, and I could not find where this keyword is defined. Is this a new addition in C♯ 7 or later?

Jul 16 '18 14:07 JoergWMittag

@JoergWMittag

It's actually been a keyword since C# 1.0, it's used to target an attribute to the assembly:

[assembly: Description("This assembly does ...")]

Jul 16 '18 14:07 HaloFour

@HaloFour: That is the assembly Attribute Target Specifier (documented here), not the assembly keyword.

Jul 16 '18 14:07 JoergWMittag

@JoergWMittag

That is the only assembly keyword recognized by the C# compiler today. It's contextual, and doesn't apply in any other case. It's not a reserved word, like many of the original keywords, which adds some fun confusion to the conversation.

Jul 16 '18 14:07 HaloFour

@JoergWMittag The term "assembly" is mentioned several times in the Introduction section of the C# spec.

@HaloFour To add some confusion, the C# spec doesn't use the term "reserved word", it calls those just "keywords". Which means that contextual keywords are not keywords.

And Roslyn uses different terms: "reserved keyword", "contextual keyword" and "keyword".

Jul 16 '18 14:07 svick

https://github.com/dotnet/roslyn/blob/23ddb23da69d7a7af78403ae27039ccc1660c97e/src/Compilers/CSharp/Portable/Syntax/SyntaxKind.cs#L166

Jul 16 '18 19:07 bondsbw

Interesting... DU's, at least based on reading Efficient discriminated unions in C#7, which led me there, which led me here, starting to look a little... No, A LOT like Java enums, no?

Jun 18 '19 20:06 mwpowellhtx

@mwpowellhtx Java enums have a single instance of each option, which is a value. Discriminated unions on the other hand have an arbitrary number of instances of each option, which is a type. They are funamendally different enough that there is no risk of infringing on my previous patent for Java enums.

Jun 18 '19 22:06 gafter

@gafter Great point. I did not say they were identical, but pretty similar, or depending upon the accessors you apply to each instance.

Jun 18 '19 22:06 mwpowellhtx

If the intention is to "close" a class within in the boundaries of a compilation unit, such as a project, to be able to do things like exhaustive switch which avoids problems with unaccounted cases, why not build the logic into the class hierarchy itself as virtual methods? Let me elaborate a little. If there are data objects (record, POCO, DTO, w/e you want to call it), and different code verticals of logic can perform various operations on those objects, it makes sense to write code which does things like:

// assembly Animals
class Animal { double Weight {get;set;} }
class Dog:Animal { DogBreed Breed { get; set; }  }
class Cat:Animal { CatBreed Breed { get; set; }  }

// assembly AnimalFeeder, references Animals
if(animal is Dog dog) { var food = GetFood(GetDogFoodTypeByBreed(dog.Breed);  } 
else if (animal is Cat cat) { var food = GetFood(GetCatFoodTypeByBreed(cat.Breed);  }

// assembly AnimalDoctor, references Animals
if(animal is Dog dog) { var drug = GetDrug(GetDogDrugTypeByBreed(dog.Breed); } 
else if (animal is Cat cat) { var drug = GetDrug(GetCatDrugTypeByBreed(cat.Breed); }

However, the whole premise of closed types is that all the code lives in the same assembly; why not just go:

// assembly Animals
class Animal
{
  double Weight {get;set;}
  abstract Food GetFood();
  abstract Drug GetDrug();
}
class Dog:Animal
{
  DogBreed Breed { get; set; } 
  override Food GetFood() { ... }
  override Drug GetDrug() { ... }
}
class Cat:Animal
{
  CatBreed Breed { get; set; } 
  override Food GetFood() { ... }
  override Drug GetDrug() { ... }
}

Nov 24 '21 03:11 TahirAhmadov

@TahirAhmadov

If the intention is to "close" a class within in the boundaries of a compilation unit, such as a project, to be able to do things like exhaustive switch which avoids problems with unaccounted cases, why not build the logic into the class hierarchy itself as virtual methods?

Truly that is not the intention, or I would have stated it as such.

Nov 24 '21 05:11 mwpowellhtx

If the intention is to "close" a class within in the boundaries of a compilation unit, such as a project, to be able to do things like exhaustive switch which avoids problems with unaccounted cases, why not build the logic into the class hierarchy itself as virtual methods?

Well, because it's not always feasible. Imagine that you have a (closed) class hierarchy, which represents an Animal, and you want to

draw it on the graphical UI in one project
output it as text in another project
serialize to JSON in yet another project.

With virtual methods, you'll need to pack all the possible virtual methods on the Animal descendants, which would be far beyond single responsibility principle.

Nov 24 '21 10:11 vladd

@vladd This was not the question. The question was literally along the lines of Enumerations in CSharp. End of story. Out here.

Nov 24 '21 15:11 mwpowellhtx

If the intention is to "close" a class within in the boundaries of a compilation unit, such as a project, to be able to do things like exhaustive switch which avoids problems with unaccounted cases, why not build the logic into the class hierarchy itself as virtual methods?

Well, because it's not always feasible. Imagine that you have a (closed) class hierarchy, which represents an Animal, and you want to
1. draw it on the graphical UI in one project

2. output it as text in another project

3. serialize to JSON in yet another project.
With virtual methods, you'll need to pack all the possible virtual methods on the Animal descendants, which would be far beyond single responsibility principle.

Yes, you can have a closed type hierarchy in one project, and then potentially many other projects which reference the first project which have different logic which operates on the closed hierarchy. But that doesn't really guarantee the hierarchy being closed; at run time, another version of the first assembly can be substituted which has more derived types, and now your consumer project has an unaccounted case.

Nov 24 '21 16:11 TahirAhmadov

I think my original comment may have detracted from the conversation. My main point was: if you're looking for an enum with the type of things that you can do with OOP, then you already have OOP, just use OOP.

It is possible that the JITter could be made aware of closed ([Exhaustive]) types, and elide heap allocations (at least for readonly types). I would personally prefer native value DU types, but that would also require GC work.

Further syntactic sugar can exploit [Exhaustive], DUs as one example.

[Exhaustive] is an iteration towards multiple possibilities without having to choose one of those possibilities right now.

Dec 04 '21 21:12 jcdickinson

@jcdickinson

I would personally prefer native value DU types, but that would also require GC work.

Why would this require GC work?

walked back most of my comment as I thought I was replying on a different issue and much of it wasn't relevant here

Dec 04 '21 21:12 HaloFour

Why would this require GC work?

The Option example you provided wasn't, in terms of memory layout, strictly a union. While this isn't a problem for a two-typed DU, it would be a substantial problem if there were more types involved (which is really common practice in languages with DUs): it would be a lot of memory copied through stack frames. Of course ref, in and out are all workaround, but it renders the abstraction leaky, and innocent code could result in very serious performance problems. Worse: the feature would be rendered completely useless if you needed more than a few types.

What you really want is something like this:

   [FieldOffset(0)]
   int marker;
   [FieldOffset(4)]
   string possibility1;
   [FieldOffset(4)]
   float possiblity2;

Aliasing a ref type and a value type is currently UB, so needs runtime/GC work. Using heap types isn't ideal, but it can be optimized away later on (possibly as a side effect of a larger heuristic). Even if it is never optimized away, the GC has become really good at dealing with these very short-lived heap values.

Dec 05 '21 04:12 jcdickinson

csharplang csharplang copied to clipboard

Proposal: "Closed" type hierarchies

Background

Problem

Solution

Issues with using .NET to encode closed

Failure scenarios

Conclusion

Design meetings

csharplang
csharplang copied to clipboard

Issues with using .NET to encode `closed`