[Proposal]: file local types

Open stephentoub opened this issue 2 years ago • 69 comments

"file private" visibility

[x] Proposed
[x] Prototype: Not Started
[x] Implementation: Not Started
[x] Specification: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/file-local-types.md

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-03-14.md#file-private-visibility
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-03-21.md#file-private-visibility
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-03-30.md#file-private-accessibility
https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-04-13.md#file-scoped-types

Dec 09 '21 22:12 stephentoub

Is a new accessibility modifier necessary or could the compiler be "relaxed" so that private can be applied to top-level types?

Dec 09 '21 23:12 HaloFour

Is a new accessibility modifier necessary or could the compiler be "relaxed" so that private can be applied to top-level types?

What if you're generating part of a partial type?

Dec 09 '21 23:12 333fred

What if you're generating part of a partial type?

Ah, I see, so this could also apply to a nested class emitted within the generated portion of a partial class, where private would not hide the nested class. That makes sense.

To play devil's advocate, would it not be sufficient to have the compiler recognize an attribute to enforce this behavior?

Dec 09 '21 23:12 HaloFour

The idea of having file private types / methods is a mechanism I use a lot when coding in F#. There you can use a signature / implementation file pairing. Only types / methods that are present in the signature file are available to the rest of the code in the assembly. The rest are only visible to the code within the implementation file.

I've found this to be very advantageous when coding in F# and I make use of it often. It's a way of defining helper types that just don't have any visibility within the rest of the assembly. It's a clean separation and really allows you to avoid spaghetti code in larger apps and instead force communication across defined boundaries.

I mention F# here because it's the language I've used this feature the most in. But it's present in other languages like Go via exported / non-exported types in packages.

I've often longed for a similar feature in C#. When @stephentoub was chatting with me about the problems they are hitting isolating their generator code it occurred to me that this would also be a mechanism for generators to effectively isolate themselves. It gives them a greater degree of freedom in what they generate without the fear that consumers will depend on implementation details that generators never intended to expose.

Dec 09 '21 23:12 jaredpar

Split into files has traditionally (except file-scoped namespaces) been neglectable in C#. Wouldn't it be better to make the types not file-private, but partial class part private? Other usecases seem to be covered by private inner classes.

Dec 11 '21 10:12 vladd

@vladd I don't think that would work well. Consider code like the following (using the RegexGenerator that's in the current .Net 7 alpha build):

partial class Server
{
    [RegexGenerator(@"(?:[0-9]{1,3}\.){3}[0-9]{1,3}", RegexOptions.IgnoreCase)]
    public partial Regex CreateIpAddressRegex(); 
}

partial class User
{
    [RegexGenerator("(.+)@(.+)")]
    public partial Regex CreateEmailRegex();
}

If the generated code for these two methods wanted to share some helper code, there is no way to use partial class part private types or private inner types. But file private types make it very easy.

Dec 11 '21 11:12 svick

Is it possible to hide the generated code as nested private types inside a special class? I was going to say it's an easy workaround but thinking about it more, it actually looks like a solution, if not the solution. This "container" class can be made partial, hence allowing the generation of a significantly large amount of "private" code without creating enormous files. In abstract terms (not abstract keyword), it makes sense, too - the inner workings of your generated functionality are hidden as private members of a "public" class (whether it's public or merely visible to the rest of the project).

Dec 11 '21 16:12 TahirAhmadov

@vladd I don't think that would work well. Consider code like the following (using the RegexGenerator that's in the current .Net 7 alpha build):
partial class Server
{
    [RegexGenerator(@"(?:[0-9]{1,3}\.){3}[0-9]{1,3}", RegexOptions.IgnoreCase)]
    public partial Regex CreateIpAddressRegex(); 
}

partial class User
{
    [RegexGenerator("(.+)@(.+)")]
    public partial Regex CreateEmailRegex();
}
If the generated code for these two methods wanted to share some helper code, there is no way to use partial class part private types or private inner types. But file private types make it very easy.

Good point. How about an attribute like [PrivatesVisibleTo(typeof(User))]?

Dec 11 '21 16:12 TahirAhmadov

@TahirAhmadov That name, though

Dec 11 '21 16:12 jnm2

I would be amenable to this idea. My personal take is that the symbols not be available outside the file ever (no attributes to expose it), and that they likely be implemented with a name mangling strategy to avoid cross file collisions.

Dec 11 '21 18:12 CyrusNajmabadi

The cross file symbol collision is definitely a consideration to take into account. It is a bit of a sore spot in other languages. IIRC F# gives a bit of a cryptic error when you have collisions instead of resolving them.

I think we'd likely want to keep the simple names of the symbols the same. For example

namespace Example { 
    file private class Widget { ... }
}

Think the identifier should still be Widget here. in metadata To do otherwise would end up subverting a lot of expectations.

Instead think we should consider inserting a generated name between the namespace and the type name. In this case having Widget emitted as Example.<>_generated.Widget. That would allow us to avoid collisions while minimally subverting expectations around how types map to metadata names.

Dec 13 '21 18:12 jaredpar

@jaredpar . I think that makes a lot of sense. I agree with you that there is a strong desire to likely have the metadata name match (though it's not sacrosanct for me). Your approach seems like a good best-of-both worlds option.

For things like file fields, if <>_generated was just a mutable struct, then this becomes effectively free at runtime (presuming the runtime doesn't fall off a cliff in any cases). Are there things that would not work if these fields moved to such a struct? Things like ref would still work and whatnot. I presume FieldOffset might make no sense though...

Just trying to think through corner cases here. Thoughts?

Dec 13 '21 18:12 CyrusNajmabadi

That is a good point about fields. In my head I'd been focusing on the cases where the only type itself was marked as file private I hadn't really thought about the cases where you got down to the individual members (self fail)

I think at that level of granularity it gets really hard to maintain both:

Don't let different file private conflict with each other
Don't break the mental model of developers on how code is emitted to metadata

Imagine for example reflection. The moment we start name mangling individual members then it becomes really hard to think about how to use features like reflection to use them.

Thinking along the lines of corner cases, I'm still trying to decide if partial makes sense when one of the declarations is file private. If the goal is that one file private is completely hidden from another then I don't think you can allow partial. Because then you run into cases like this:

// file1 .cs 
file private partial class Widget { 
  public overrides string ToString() => "";
}

// file2.cs
partial class Widget { 
   // Error: multiple overrides of ToString
  public overrides string ToString() => "something";

Part of my brain is screaming "only allow file private on type declarations" because that allows us to give the guarantees we want and provides a clean solution for tools like generators that want an isolated space for their work.

Dec 13 '21 22:12 jaredpar

Can't we use a modreq for this? Something like class FilePrivate { }? Of course when you import an assembly containing such modreqs, the 'handling' for them is to simply assume you can't access anything with the modreq..

Dec 14 '21 01:12 RikkiGibson

@RikkiGibson you can't put a modreq on a type, only on method signatures. That is why we have to tricks like [Obsolete] when emitting a ref struct.

A modreq could be a solution for non-virtual method conflicts though. The basic pattern we could employ is generating types into the assembly in the pattern of Microsoft.CodeAnalysis.<>FilePrivateN where N equals the number of files which contain at least one file private modifier. Then every method in a file gets a modreq(MS.CA.<>FilePrivateN which makes it unique.

That does pose a problem though: such methods could not be used as implicit implementations of interface members. The presence of the modreq would make the signatures not match. We could counter by emitting an explicit implementation thunk under the hood though but we're back to that could conflict with explicit implementations in other files.

We'd still need solutions for virtual methods, interface hookups, fields, etc ...

The more we talk about this, and it's fantastic feedback, the more I'm wondering if forcing this at the type level is the best solution. It solves the core problems here, allows us to maintain the idea of it being truly private (don't worry abuot name conflicts) and no other language I'm aware of allowed parts of types to be file private (it's all or nothing).

Dec 14 '21 01:12 jaredpar

I'm not really sure if this is actually needed (at least in the proposed way)? My first thought for making helper types unlikely to be used by non-generator code would simply be a nested namespace.

namespace Whatever.YourRegular.CodeUses
{
  partial class TypeToExtend
  {
    partial void DoTheThing() => DontUseThis.Helper.DoTheSharedThing();
    // or, if you dislike putting the namespace in front:
    //using DontUseThis;
    //partial void DoTheThing() => Helper.DoTheSharedThing();
  }
  namespace DontUseThis
  {
    internal static class Helper
    {
      public static void DoTheSharedThing() => throw null;
    }
  }
}

This doesn't prevent non-generator code from using it, but it is more obvious if anyone does attempt to use it, and it doesn't show up by default (since it isn't in the same namespace and you'd have to be using it first). Plus, it works today without any changes.

I do wonder if I could make use of the proposed solution anywhere (outside of source generators). I can't really think of any other situation where this is both a useful addition and something that can't be done otherwise. How would those types fare in the face of reflection? I briefly started typing about one thing I do that discovers types that implement a certain interface (and shouldn't really be used by anyone else), but dropped it from my reply because it seems a fairly specialized and niche thing.

I could see a top-level class using the private modifier to trigger this though (which today is not allowed and raises CS1527). But if we go from there, it somewhat also opens the flood-gates for "invisible" base classes. Consolidate common functions in a non-public (or internal) class, derive from those, and callers see it as if the class had no base class (basically inlining it; but I don't know if IL could do this in any way without duplicating the code as well). And thats something for another day (and a different proposal, if at all).

Dec 16 '21 18:12 BhaaLseN

This doesn't prevent non-generator code from using it

And that's the problem. Then we change the generator, and code breaks.

Dec 17 '21 12:12 stephentoub

Can a namespace be file private? C++ uses anonymous namespaces for this purpose:

namespace
{
    // the code here is effectively file private
}

Dec 17 '21 13:12 vladd

@vladd

Namespaces don't really exist, they're just part of the class name so they can't have their own metadata like accessibility levels. The same namespace can be used freely between different assemblies as well. The individual types within the namespace would need to carry that metadata.

Dec 17 '21 14:12 HaloFour

@HaloFour Well, this seems to be not a problem in C++, it looks like the anonymous namespace has a unique unspeakable name from the linker‘s point of view.

Dec 17 '21 16:12 vladd

@vladd

Different languages, different problems :)

Implicitly treating a file private class as within it's own unspeakable namespace might be a way for the compiler to lower the class as something that can't be referenced externally without affecting the simple type name. I don't know if that works for nested classes, though. The relationship is defined by an entry in the NestedClass metadata table so perhaps the name of the nested class could be completely unrelated to the enclosing class, but you'd end up with something not expressable in C# or even IL which both require lexical nesting.

Dec 17 '21 16:12 HaloFour

@HaloFour Well, I basically asked if the C++ approach is a way to go: no explicit file private modifier but using an anonymous namespace for it. This way we would avoid thinking in files but will be thinking in namespaces instead.

Dec 17 '21 16:12 vladd

@vladd

I don't think that'd work for nested classes unless C# allowed you to define namespaces within a class.

Dec 17 '21 16:12 HaloFour

@vladd

I feel like anonymous namespaces are borderline trivia in C++. It's not really obvious what they do unless you already know whereas something like file private class is more self-evident.

As an aside, personally I've found it to be valuable for generated source to be digestible by humans for debugging/learning purposes. I worry that this feature as proposed could end up forcing generator authors to put their entire output into one file (potentially making it painful for humans to interact with.)

It has its own downsides, but I wonder if friend types (https://github.com/dotnet/csharplang/discussions/2073) would be a better solution to this issue. (As-is, this whole proposal is basically a less flexible special-case of them.)

Dec 17 '21 22:12 PathogenDavid

Friends, let me share (may be) unpopular opinion: please do not complicate C# for minor reasons. For SG-produced code the EditorBrowsable approach is enough, we need only make a final design and implement it. Also, @vladd 's proposal about unspeakable namespaces is better because of its lesser impact on the language (as I see it).

if we ~~want~~ can fulfill the need without modifying the language - probalby it better to do that... we have a "preprocessor" directives (#nullable enable), project settings (<Nullable>Enable</Nullable) and visual editor level (EditorBrowsable). why we need to modify the language? let's put #file private on on the top of SG-produced file (we will need a #file private off also) ! so the language will be modified about 1 new preprocessor directive...

There is a lot of choices, please do not make C# more complex for the minor reason like hiding SG code.

Thanks! (sorry for poor English, I hope my point is clear)

Dec 24 '21 17:12 lsoft

Regarding metadata representation, we can represent file private members with the ECMA.335 compilercontrolled visibility.

Having read the spec, my understanding is that type members with this visibility must be referenced with a TypeDefinition metadata token, making them effectively internal, but very internal; not even InternalsVisibleTo or its evil twin we don't talk about can expose them to another ~~assembly~~ module, and that many compiler-controlled members in a type can have the same name, solving concerns about naming collisions.

Some outstanding questions are:

What about file private types? Only members may be marked as compiler-controlled. The types could be marked with a special attribute (NotReferencableAttribute or something like this), and/or have their names made unspeakable.
How would the debugger handle many type members with the same name? I ran a demo and both Visual Studio and Rider show two identically named fields like this: I couldn't exhibit how to test the debugger's reaction when I hover over a field name that has a duplicate.
How would the reflection handle compiler-controlled types with the same name? I ran a demo that calls GetField("_x", BindingFlags.NonPublic | BindingFlags.Instance), it threw an AmbiguousMatchException.

I don't think (2) and (3) are very important issues; how likely is to have such collisions? But the runtime wouldn't bother and nor would the compiler who would totally ignore compiler-controlled members in referenced assemblies.

Jan 07 '22 23:01 teo-tsirpanis

There has been many other proposals, especially related to member visibility, that has been rejected due to

Language proposals which prevent specific syntax from occurring can be achieved with a Roslyn analyzer. Proposals that only make existing syntax optionally illegal will be rejected by the language design committee to prevent increased language complexity.

Can anyone here clearly explain why this proposal is not rejected for the same reason? We have been forced to use dirty attributes everywhere to disallow specific usage, and many of there are related with source generators. I don't think hiding something is anything different.

Jan 09 '22 12:01 acaly

@acaly

Because language team members are interested in this due to the use cases and scenarios that impact their work. Language design is subjective and just because some proposals are rejected or accepted doesn't mean that all proposals of the same theme will meet that same fate.

Jan 09 '22 14:01 HaloFour

Honestly I like this feature, because it can solve many scenarios that two or more closely related classes need to access internals of each other, while preventing those to be accessed by other classes in the same assembly.

The point is, this same problem has been raised by community long time ago, and is never seriously discussed simply because of the above reason. This proposal today proves that the claim

Language proposals which prevent specific syntax from occurring can be achieved with a Roslyn analyzer.

is not always true, and you should really be more forgiving to proposals suggested by comminity, because the community has far more different use cases that you can't easily imagine right now but you may probably hit the same thing tomorrow.

Jan 10 '22 06:01 acaly

Language design is subjective and just because some proposals are rejected or accepted doesn't mean that all proposals of the same theme will meet that same fate.

Looks like there are 2 rule sets: the first for community (published), the second for members of Design Committee (unpusblished, a relaxed version of the first probably). Any proposals of specific kind given by community will be rejected, but same kind proposals from the members - the different situation. Interesting example of Orwell's doublethink, I guess.

I'm completely fine with that, honestly, I just like to know what the real rules are.

Because of the rules for community and design committee are different, there are a little value of spending time for us here.

Jan 10 '22 06:01 lsoft

csharplang csharplang copied to clipboard

[Proposal]: file local types

"file private" visibility

Design meetings

csharplang
csharplang copied to clipboard