zenscript
zenscript copied to clipboard
Orthogonal concepts
I am trying to nail down in my mind, the fundamental orthogonal semantic concepts of our proposed programming language.
Concept | Description |
---|---|
data |
Unified sum, product, recursive, and record data types. No early bound operations other than construction and access. Record types may optionally be mutable. |
typeclass |
Delayed binding of operations to types at the use-site. Contrast to OOP (subclassing) which binds operations at construction of objects. For polymorphic types, typeclass objects delays binding operations until construction of the typeclass object (instead of prematurely binding operations at the construction of an instance of the implementing type); whereas, my posited and proposed solution to the Expression Problem employing unions, in theory further delays binding to the use-site for polymorphic types. Note that delaying is a positive attribute in this context, because it increases degrees-of-freedom. |
module |
Encapsulating data with compile-time (statically) bound operations and access control, enabling more sophisticated types. |
monadic effect system | https://github.com/keean/zenscript/issues/4#issuecomment-248569044 |
Thus I am wondering if module
is the most apt name for the concept? I think of a module as a unit of code that is imported separately. We still need to import data
and typeclass
, so is this done by files? Isn't each file then a module? If so, what is the apt name for the concept of module
above? I am thinking the above concept for module
is really a class
(without the subclassing inheritance anti-pattern). Can't modules
extend other modules?
Edit: note also OCaml functors and modules.
Edit#2: on typeclasses, I replied to @keean:
Optional implicit parameters (which Scala has) provide a way of unifying runtime interfaces (passed by value) and monomorphisable interfaces (passed implicitly by type).
I like that summary (in the context of what I had written). We should make sure we frame that and use it in our documentation.
Edit#3: @shelby3 wrote:
I think it is important to remember that we want to keep implementation of interface (i.e. typeclass) orthogonal to instantiation of data, otherwise we end up with the early binding of subclassing. For example, in subclassing both a rectangle and a square would be subclasses of an interface which provides width and height. Yet a square only has one dimension, not two. Any interfaces which need to operate on the dimensions of rectange and square need to be customized for each, which is what typeclasses do. We will still have subtyping such as when an interface extends another interface then we can subsume to the supertype (which is why we need read-only references and for supersuming then write-only references). Yet references will not have the type of an interface if we don't provide typeclass objects, so then the only case of subtyping will be the subset relationships created by conjunctions and disjunctions of data types.
Yes, I think that is correct. We basically split objects into three orthogonal concepts, data are effectively object properties, typeclass are inferfaces, an modules control data hiding (privite data etc).
Note: Elsewhere I proposed that we can combine the three into a single syntax and generic concept, but lets leave that to one side for now.
Did you see my edits pointing out that 'module' has two meanings? Should we use another keyword than module
?
Extension and functors are something ML supports on modules, at the moment I don't think we should. I like the ability to have multiple modules in one file, but effectively modules are able to be separately compiled and linked later. In the simplest form they are like namespaces where you cannot access their private data from outside. A module has to export things to be visible outside, and import the external functions and data in order for them to be visible inside.
Note I added the idea for a monadic effect system. Well I guess it isn't orthogonal in some sense as its implementation relies I suppose on the other 2 or 3 concepts, but the systemic model for handling of impurity is an orthogonal concept.
@shelby3 wrote:
Concept Description typeclass
Delayed binding of operations to types at the use-site. Contrast to OOP (subclassing) which binds operations at construction of objects. For polymorphic types, typeclass objects delays binding operations until construction of the typeclass object (instead of prematurely binding operations at the construction of an instance of the implementing type); whereas, my posited and proposed solution to the Expression Problem employing unions, in theory further delays binding to the use-site for polymorphic types. Note that delaying is a positive attribute in this context, because it increases degrees-of-freedom.
I want to raise this conceptualization of solving the Expression Problem to a matrix, so that we can how this plays out on more levels.
Background
Dynamic polymorphism is instituted whenever we bind the interface to an object and refer to the type of that object as the interface; Because whether it be at data type object instantiation for subclassing or typeclass object instantiation (aka Rust trait objects), we then must incur dynamic dispatch because when holding a reference of said interface type at compile-time, there is no static (compile-time) knowledge of the types of data that can be referenced with the said interface type (rather the runtime dynamic dispatch handles the dynamic polymorphism). In subclassing the interface is a supertype and in typeclass objects, the interface is the typeclass bound (constraint) on data types that can be assigned to a reference with said type.
So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function. More importantly, we lose the axis of the Expression Problem to add new interfaces (operations) to the data types referenced via an interface type. We can add new data types but can't add new operations.
Subclassing binds the interface very early at data type instantiation. Typeclass objects bind later at typeclass object instantiation. And typeclass bounds bind even later at the function call site.
So ideally, we wanted to always delay to typeclass bounds, but this conflicts with polymorphism of adding data types orthogonally when dealing with heterogeneous union types. Thus I pointed out that by retaining the union of data types instead of prematurely subsuming them to the typeclass bound of a typeclass object, then we could delay the binding to the function call site. Thus we can continue to add new orthogonal operations to the union type, unlike for typeclass objects. The trade-off is that adding a new type to a union type requires invariant collections (such as a List
); whereas, typeclass objects will interopt fine with arrays because the type of the array doesn't need to change when adding new data types to an exist typeclass object type. I explained this is more detail in prior linked discussion. Readers can refer to the prior explanation, so I don't need to repeat it here.
Matrix Conceptualization
So what we can see is there is a tension with trade-offs between the competing choices of carrying around the specific data types and subsuming to the interface. So both are needed for different situations.
Recently @skaller pointed out another axis of this tension. That is in the case where don't want to discard the data types because we want to leave them open to new operations, but we want to dictate that a specific implementation is used for a specific typeclass where ever it might be needed. Apparently OCaml's functors accomplish this by being first-class and one can pass these along with a module and they will morph the module with the specific constraints desired, while leaving the rest of the module unspecialized, and thus through the chaining of functors, one can get the same effective result as early binding for one typeclass but remaining open to extension for other typeclasses.
So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected. So in other words, dynamic patching of the dictionary for the dynamic dispatch, wherein if that data type is then applied to a trait object at a call site or trait object assignment, then the prior binding is not overwritten and any additional dynamic dispatch bindings are augmented.
So then we no longer view data types and typeclass objects separately. A data type always has a dynamic dictionary attached to it, yet it is augmented at runtime, not with compile-time knowledge. The compiler can only guarantee the required typeclass operations will be available at runtime, but it can't guarantee which implementation was selected, because at runtime when it writes the implementation to the dynamic dictionary of the typeclass object, it can't know if a preexisting one will already exist.
So we have lifted dynamic polymorphism to a 2nd order. We have dynamism of dynamic dispatch.
The matrix is we have two axes of data type and interface, and a third axis of implementations that cojoin the two. This third axis can't be tracked at compile-time without dependent typing, but the compiler only needs to insure that the required interfaces are present in order to insure soundness. So this works.
What this enables is for example @skaller's example of wanting to set the ordering of comparison operations, while leaving the data type open to extension with other typeclasses. @keean's proposed solution also achieves it, but it lacks atomicity of retaining this with the data type. For example, @keean's solution is not generic in type of algorithm that can be applied ex post facto and thus it fails another axis of the tension of solving the Expression Problem.
What we will see generally is that higher order dynamism is required to solve higher order tensions in the Expression Problem.
So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This is not quite right. We can still mono-morphise even with compile-time interface binding - in fact for the traditional requirement this is a requirement. The point is we can statically know both the interface and the exact type passed, and therefore chose and inline the exact function. Rust does this in all cases exept when dynamic polymorphism in involved.
So we can inline and monomorphise the functions when the interface is statically bound and the type is statically know. If either are false (so dynamic binding to the interface, or dynamic typing) we cannot monomorphise in languages like Rust.
So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected.
This is exactly what adding functions to an objects prototype does in JavaScript.
@keean wrote:
So really what we need is to being able to attaching a typeclass object to a data type instance and pass them around as a pair every where a data type is expected.
This is exactly what adding functions to an objects prototype does in JavaScript.
No it is not the same. The JavaScript prototype
impacts all instances. I am referring to impacting only instances which are assigned to the trait bound.
@keean I have decided that it is nearly impossible to try to explain these issues like this. I will need to make detailed code examples and write a white paper. You don't seem to understand me.
No it is not the same. The prototype impacts all instances. I am referring to impacting only instances which are assigned to the trait bound.
So you want the implementation used to depend on the value of the object not just its type.
This is monkey-patching, adding a method directly to the object itself.
@shelby3 wrote:
So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This is not quite right.
It is right. I just haven't been able to communicate my points to you.
You have written it badly then because
once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This is exactly the opposite of what is true.
We must bind the interface statically in order to be able to monomorphise and we must statically know the type at the call site as well.
@keean wrote:
The point is we can statically know both the interface and the exact type passed
If you had understood me, you would understand that I already explained that we don't know the exact type passed, that was entirely the point of binding early erasing the data types from further compile-time knowledge and you only have the trait bound remaining. That you didn't even get that point from what I wrote seems really bizarre.
If you had understood me, you would understand that I already explained that you don't know the exact type passed, that was entirely the point. That you didn't even get that point from what I wrote seems really bizarre.
We always know the exact type passed, that is the whole point of parametric polymorphism. The only time we do not is if there is what Rust calls a trait-object or what Haskell calls an existential-type involved.
@keean wrote:
The only time we do not is if there is what Rust calls a trait-object
Bingo. Re-read my long comment again.
So in a traditional system like Rust, if there is a trait object we can never monomorphise, it doesn't matter how early or late the type-class is bound.
@keean wrote:
once we bind an interface at compile-time _before_ the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This is exactly the opposite of what is true.
The word 'before' does not mean 'at'. It means 'before'. I am referring to subsuming to the trait bound before the call site, such as a trait object. But then I generalize it to partial trait objects and 2nd order dynamism.
@keean wrote:
So in a traditional system like Rust, if there is a trait object we can never monomorphise, it doesn't matter how early or late the type-class is bound.
In a system where I have generalized it to partial trait bounds and 2nd order dynamism then we still can't monomorphise but we gain the ability to add operations ex post facto.
Your reading comprehension my long comment was really not adequate. Was it really that badly explained or is it your mind is ossified and not prepared for a generalization?
The word 'before' does not mean 'at'. It means 'before'. I am referring to subsuming to the trait bound before the call site, such as a trait object. But then I generalize it to partial trait objects and 2nd order dynamism.
But if you have dynamic polymorphism you cannot mono-morphise. It doesnt matter whether the interface is bound before or at the call site.
@keean wrote:
But if you have dynamic polymorphism you cannot mono-morphise. It doesnt matter whether the interface is bound before or at the call site.
Precisely. Early binding of interface requires dynamic polymorphism in order to open to adding new data types to the interface bound, e.g. for collections.
Also, you are only considering one of the tradeoffs. You forgot the other one which my long comments states and I have reiterated:
In a system where I have generalized it to partial trait bounds and 2nd order dynamism then we still can't monomorphise but we gain the ability to add operations ex post facto.
But that is not what this sentence says:
So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This says we cannot mono-morphise if we bind the interface at compile time, when there is dynamic polymorphism. But we cannot mono-morphise at all with dynamic polymorphism, so the logic is wrong.
@keean wrote:
But that is not what this sentence says:
So the lesson that once we bind an interface at compile-time before the function call site, then we forsake static polymorphism and the opportunity for mono-morphisizing the function.
This says we cannot mono-morphise if we bind the interface at compile time, when there is dynamic polymorphism.
Correct.
But we cannot mono-morphise at all with dynamic polymorphism,
Correct.
so the logic is wrong.
What is wrong? You just didn't apparently understand that the goal is not to achieve monomorphism but rather to solve the Expression Problem and keep the extension open to new trait bounds even though we've partially bound to some typeclasses:
@shelby3 wrote:
Precisely. Early binding of interface requires dynamic polymorphism in order to open to adding new data types to the interface bound, e.g. for collections.
The point is that we always pass a data type along with a dictionary, but that dictionary can be incomplete and we don't track at compile-time what may already be in that dictionary. When we are ready to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime (insuring they will be available as required for compile-time soundness), but will not overwrite any that already existed in the dictionary.
The point of this is that our caller can set preferential implementations (such as which direction of ordering for lessThan
operation), yet the code that is called doesn't have to prematurely specialize on any trait bound signatures. Premature specialization is the antithesis of combinatorial composition as per @skaller's excellent point.
I think we aren't actually attaching the dictionaries to the data objects. Rather the data objects are tagged (instanceof
) and the dictionaries are passed in as arguments to each function. Perhaps we should model it with a monad?
When we are reading to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime.
This is the same as asking if the implementations exist from the call-site. In order to add them, they must first exist in the source code, so the compiler can statically check they exist.
We can go further, we can ask, from the call site, get a list of all types for which there is an implementation of this function. Now we could put this in the type signature so that the linker knows statically what types it is safe to send to that function.
Wow you really aren't getting it.
When we are reading to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime.
This is the same as asking if the implementations exist from the call-site.
Read again:
When we are ready to bind some trait bounds at a call site, then we write some code which will add these implementations to the dictionary at runtime (insuring they will be available as required for compile-time soundness), but will not overwrite any that already existed in the dictionary.
The point of this is that our caller can set preferential implementations (such as which direction of ordering for
lessThan
operation), yet the code that is called doesn't have to prematurely specialize on any trait bound signatures. Premature specialization is the antithesis of combinatorial composition as per @skaller's excellent point.
There is still something odd about the way your description comes across to me, which might be my misunderstanding...
If I have a function that calls methods on something:
f(x) =>
x.call_widget()
We can say with 100% certainty whatever type whether dynamic or static polymorphism that is passed to f
must implement call_widget. Do you agree?
@keean wrote:
We can say with 100% certainty whatever type whether dynamic or static polymorphism that is passed to
f
must implement call_widget. Do you agree?
Agreed. And the mistake in your thinking is you are conflating not knowing if an preexisting implementation will not be replaced at runtime, with knowing at compile-time that all required implementations will be placed into the dictionary as required. But I am repeating myself.
I understand the concept is a bit confusing at first for someone not expecting it. I do tend to come of out no where with mind twisting new things.
And the mistake in your thinking is you are conflating not knowing if an preexisting implementation will not be replaced at runtime, with knowing at compile-time that all required implementations will be placed into the dictionary as required. But I am repeating myself.
That is irrelevant, I am talking about at the call-site.
So when call_widget
is called on x
there must be an implementation for it available. Do you agree?
So then when we have:
f(x) =>
x.call_widget()
We can clearly see no code in f
is inserting any implementations into anything. Sorry to have to take it this slowly, but I am feeling particularly stupid today. Are we agreed?
I thought you were inferring the trait bound on x
. Yes you need a trait bound there. So what is your point?