Drasil Typing the Expression Language

Long-term ticket for general discussion on, and tracking the progression of "adding typing to the expression language".

Related tickets:

[ ] #2655

Jun 30 '21 16:06 balacij

@JacquesCarette Following up with discussion on https://github.com/JacquesCarette/Drasil/commit/ff34fe093b8898383232e2822f478a394243aef9#r52161989, the preface discussion of #2482 related to the DiscreteS and DiscreteD usage, #356, and https://github.com/JacquesCarette/Drasil/issues/2118 about Enumerations and Discrete spaces.

Currently, both of EnumeratedStr and EnumeratedReal are essentially unused (both used unused constraint lists), DiscreteS is unused, and DiscreteD is used (though, I believe it's used for a display hack due to the EnumeratedReal being essentially unused).

EnumeratedStr and EnumeratedReal are both used in exactly 1 spot each: https://github.com/JacquesCarette/Drasil/blob/b8da80982650c50c1abc97844fc35bd0a8211bc1/code/drasil-example/Drasil/GlassBR/Unitals.hs#L108-L116 Note: In nomThick, enumc = EnumeratedReal Physical.

Removing both constraints does not alter stable because they go unregistered in the final constraint list used: https://github.com/JacquesCarette/Drasil/blob/b8da80982650c50c1abc97844fc35bd0a8211bc1/code/drasil-example/Drasil/GlassBR/Unitals.hs#L67-L73 Note: the inputDataConstraints is what's registered for calculating the list of constraints for Data Constraints Section of the SRS.

DiscreteD is used as a display hack for displaying the values for which are allowed for the EnumeratedReals allowed for nomThick. This occurs in the line defining nomThick as a unital, giving nomThick a noun phrase term/name with a t element of {...} suffix: https://github.com/JacquesCarette/Drasil/blob/b8da80982650c50c1abc97844fc35bd0a8211bc1/code/drasil-example/Drasil/GlassBR/Unitals.hs#L108-L110 Note: displayDblConstrntsAsSet is a function that creates a DisplayExpr from the UID of nomThick and the list of doubles allowed. There is no similar thing done for DiscreteS and the glassType.

I'm now wondering how we should proceed with the discussion in https://github.com/JacquesCarette/Drasil/commit/ff34fe093b8898383232e2822f478a394243aef9#r52161989. Should we remove the EnumeratedX constraints + the Discrete spaces (+ the t element of {...} suffix of nomThick) in favour of a proper design done later for #356 and #2118? We could then build a Enum-proposal for #2118 soon, and build it once we have stronger typing.

EDIT: added #356 references too

Jun 30 '21 16:06 balacij

The fact that removing those constraints does not alter stable feels like a bug. That information should be used, but apparently it's not.

So I think the pragmatic thing to do is to remove Enumerated but create two high-priority issues to bring the information thus cut out back. Note that a first hack of bringing the information back could be display-only. #2118 is hard because it tries to deal with code generation, and that design is highly non-trivial. So the high-priority would be to get things back at least for display.

Jul 02 '21 16:07 JacquesCarette

The information is displayed for them in places other than the constraints table, but it's using existing display hacks for them. Would you want a hack to have the information displayed in the constraints table as well?

Jul 05 '21 17:07 balacij

I'd rather not have a hack, and yet still display them in the constraints table! What would it take to do that?

Jul 05 '21 17:07 JacquesCarette

That's a great question!

I think there are 2 real issues/ideas here that are different but highly related (Enumerated constraints and #2118) that might be a bit mixed together in the code at the moment, so I think we should clarify things a bit first. One issue is to create restrictions (constraints) on typed variables, restricting the values by some invariant(s). Another issue is arguably to "create a new primitive space".

Issue 1 Taking https://github.com/JacquesCarette/Drasil/blob/3e522b7667f7fbd7cad7fd0c0eaf96a62230eefe/code/drasil-example/Drasil/GlassBR/Unitals.hs#L108-L112 as an example.

We want to keep nominal thickness as a rational number, but we want to restrict it to certain values. We then use them directly to make calculations.

Note that in the specific case of nominal thickness, we will only allow it's value to take on one of many possible rational numbers, so we may want to have the allowed list of rationals in some sort of immutable set that we check against.

Issue 2 However, in the second issue (and looking at a second example), https://github.com/JacquesCarette/Drasil/blob/3e522b7667f7fbd7cad7fd0c0eaf96a62230eefe/code/drasil-example/Drasil/GlassBR/Unitals.hs#L114-L115

I believe that in this case, we should be treating the 3 glass types as primitives of a space (an enumerated type, like #2118 discusses) instead of as a String. I think that with this, we would remove the constraint from the table (well, it was never in the table to begin with but I think we were expecting it to be), rewriting code like: https://github.com/JacquesCarette/Drasil/blob/3e522b7667f7fbd7cad7fd0c0eaf96a62230eefe/code/stable/glassbr/src/java/GlassBR/DerivedValues.java#L40-L66 to use a more common enum + case/switch (in the generated source codes) against the new enum, pushing the invalid value errors and conversion to the enum type to the generated code that reads in variables. Additionally, if we choose to have these enumerated types, we may want to have a table displaying them instead of a list of restrictions on variables.

Where does the mix-up occur? The issue occurs in nominal thickness: https://github.com/JacquesCarette/Drasil/blob/3e522b7667f7fbd7cad7fd0c0eaf96a62230eefe/code/drasil-example/Drasil/GlassBR/Unitals.hs#L109 On this line, we convert the list of allowed values for nominal thickness into a Discrete Space for displaying it as a list of values. I think that instead of converting it into a Discrete Space and then displaying the Discrete Space, we should be converting it into a set of sorts and then displaying because we want to say that it belongs to a restricted subset of the rationals, but we want to leave it typed as a Rational. Note that Space is not a part of Expr, but a part of DisplayExpr, so we can resolve this mix-up by merely removing the additional of the displayDblConstrntsAsSet part.

So, what would it take to implement solutions to these 2 issues?

For the first issue, we would need the EnumeratedX constructors back + basic set functionality for checking if a read in value is an element of a set (common practice is to use static immutable sets for things like this, but I think the static immutable part can wait a bit if need be). In all 3 Expr languages, we would need functionality for variable element of set checking and set instantiation with a list of values. Only with the set instantiation would we be able to remove/resolve the issue/mix-up mentioned above (using a Discrete Space to describe the values of the EnumeratedX restrictions). It's arguable that CodeExpr wouldn't need set-like functionality for this, because we can have the same functionality of a element of {...} by using if thing == ... || thing == ... || thing == ... || ....

For the second issue, I think we would need the first issue to be resolved first (for the DisplayExpr primarily, the set notation will be helpful in discussing these spaces), and then we would need to convert our existing Discrete Spaces into "named discrete spaces with named values". We would need to mock up Enums in each of the supported exportable languages and create code generators for all of them.

Jul 05 '21 19:07 balacij

Right, to summarize, the issues are:

we need to be able to restrict values from known spaces to be from a discrete, finite, enumerated set of values,
we need to be able to declare new enumerated spaces (we tried, it was hard; not sure if #2118 really documents how much the attempt spun out of control!)

Note that I really hate code that use chains of if-else-if when the language has a proper 'switch' and enumeration facility. Also, I hate to see common code repeated when only 1 line of it actually depends on the condition.

Those are indeed two quite separate issues. They are not fully independent, but 1 is considerably easier.

The hard part of 2 is that most languages require define-before-use, so there really has to be a new name added to the name space denoting the new type, and it is that name that needs to be used almost everywhere. So parts of Drasil will need to learn about new declarations too. There are some facilities in drasil-code for this, but none in drasil-lang. But if such an enumerated space is going to be used both in the specification and in the code, the best way to do that is via facilities in drasil-lang (or as a layer on drasil-lang?) In any case, it needs non-trivial design.

Jul 05 '21 19:07 JacquesCarette

Yes. Those 2 notes perfectly explain what we need.

I think that 1 is almost required for 2, or at least will create a better tooling/ecosystem for 2 to be done.

The hard part of 2 is that most languages require define-before-use, so there really has to be a new name added to the name space denoting the new type, and it is that name that needs to be used almost everywhere.

Is there any branch/fork that I could take a look at with the attempt? I would imagine this to be difficult, but I don't think I saw much in terms of problems that accrued while building it.

I wonder if it would be good to have 2 variants of the enumeration types; a primitive name-only variant, and a more powerful Haskell ADT-like variant that allows the variants to carry data. The primitive name-only variant could be done via the enumeration functionality of the language, but the more powerful variant would likely need to use classes + inheritance/interfaces. It's probably best to start off with the primitive names-only variant.

So parts of Drasil will need to learn about new declarations too. There are some facilities in drasil-code for this, but none in drasil-lang. But if such an enumerated space is going to be used both in the specification and in the code, the best way to do that is via facilities in drasil-lang (or as a layer on drasil-lang?) In any case, it needs non-trivial design.

Yes, I think that this would need to be done in drasil-lang, closely with Space and Expr because the expression language should contain it's functionality.

For a course of action, it seems like we should try to implement the set functionality (which is also related to #2587), and then we can implement the EnumeratedX constraints fairly easily. Perhaps before adding in (2), we should add the typing to the expression language? Maybe we should add basic typing to be added in before adding in (1) too.

Jul 05 '21 21:07 balacij

It looks like there's only a ~~small~~ handful of things that we need to move from Expr to DisplayExpr/CodeExpr before we can get the basic/primitive typing started for Expr (without containers/enumerated types/etc).

Some last few items:

Integration & Continuous intervals should be moved to DisplayExpr
I think we're able to move all Matrix Expr-related things into CodeExpr for the time being. It seems to be only used in GlassBR's ModuleDefs, which purely uses CodeExprs. I'll have to double check to make sure however.
We'll need to figure out how to handle FCall -- Dr. Carette mentioned that we may need to split FCall into 2 variants of it. I''ll need to look into it more.

Also, DomainDesc and RealInterval both have 2 type variables, but they're always the same type -- should we change it to be just 1 type variable? Are there scenarios where we can "start" in one domain but "end" in another?

Jul 06 '21 01:07 balacij

As far as I know, there is no branch with the attempt. It was clear that the design wasn't going to work (but it got close! though it wouldn't have been pretty).

Yes, 2 variants of enumerated types is probably a good idea.

I would think that this would sit 'above' Space and 'above' Expr, with both Space and Expr having a facility to refer to an external defined entity "by name".

Yes, definitely typing should be done first, before any of this stuff is done.

Re: last few items. I agree that those 3 can go as you describe.

About DomainDesc and RealInterval: I don't know. Having 2 was the conservative choice, even though in practice they are always the same. It would probably be ok to make them the same. You might want to do change completely independently though.

Jul 06 '21 14:07 JacquesCarette

I know that we spent a lot of time separating the Unary & Binary constructors in Expr by their types, but I think now it would be interesting to see if we can merge them while we add typing. For example, perhaps the following could work as a design for sharing the type information, where we put the type restrictions as type parameters of the operator data types:

data UnOp i t where
  Neg :: Num a => UnOp a a
  Not :: UnOp Bool Bool

data BinOp l r t where
  Frac' :: Fractional a => BinOp a a a

  And' :: BinOp Bool Bool Bool
  Or' :: BinOp Bool Bool Bool -- Aside: It looks like we aren't able to share type signatures in GADT Syntax though! Maybe we need to just move this into another datatype instead too.

data Expr' a where
  Int' :: Integer -> Expr' Integer
  UnOp'  :: UnOp i t -> Expr i -> Expr t
  BinOp' :: BinOp l r t -> Expr l -> Expr r -> Expr t

Sep 20 '21 02:09 balacij

So: yes. We do want to "evolve" things in this direction.

Sep 20 '21 16:09 JacquesCarette

Taking a look at Expr right now...

-- | Drasil expressions.
data Expr a where

  -- | Turns a decimal value ('Double') into an expression.
  Dbl      :: Double -> Expr Double
  -- | Turns an integer into an expression.
  Int      :: Integer -> Expr Integer
  -- | Represents decimal values that are exact as integers.
  ExactDbl :: Integer -> Expr Double
  -- | Turns a string into an expression.
  Str      :: String -> Expr String

Our literals are easy to add type information, and will be slightly better via #2783

  -- | Turns two integers into a fraction (or percent).
  Perc     :: Integer -> Integer -> Expr Double

I think this is what we would want. If we want an Expr Integer, we would force the user to use another constructor.

  -- | Takes an associative arithmetic operator with a list of expressions.
  AssocA   :: AssocArithOper a -> [Expr a] -> Expr a
  -- | Takes an associative boolean operator with a list of expressions.
  AssocB   :: AssocBoolOper  -> [Expr Bool] -> Expr Bool

As a first-revision, we can convert AssocArithOper into a GADT with a single type parameter where we would have AddI :: AssocArithOper Integer, AddRe :: AssocArithOper Double, etc. This should allow us to impose the types required of the elements of the list to be implicit/based on the operation. We can leave AssocBoolOper as-is since they're all boolean-relations.

However, we can also go down the route of merging them into one constructor AssocOp :: AssocOper a -> [Expr a] -> Expr a, done similar to AssocArithOper a as discussed.

  -- | C stands for "Chunk", for referring to a chunk in an expression.
  --   Implicitly assumes that the chunk has a symbol.
  C        :: UID -> Expr _

This area is a bit more interesting. We're unable to inherit the type of the chunk from UID of the chunk alone.

How can we fix this?

We may need to have some pushout data type generated from each kind of chunk usable in an expression to inherit the type of the chunk here. For example, perhaps C :: CRef a -> Expr a where CRef is the name of a new GADT, a is it's type variable, CRef a would contain a single constructor, CRef :: UID -> CRef a, that contains the UID of the related chunk it was created from. This still wouldn't suffice because CRef would need to get it's type from the chunk from which it originated. Does this mean we also need to add typing to the Chunks in some way?

  -- | A function call accepts a list of parameters and a list of named parameters.
  --   For example
  --
  --   * F(x) is (FCall F [x] []).
  --   * F(x,y) would be (FCall F [x,y]).
  --   * F(x,n=y) would be (FCall F [x] [(n,y)]).
  FCall    :: UID -> [Expr _] -> [(UID, Expr _)] -> Expr _

Similar to C, we're not able to tell what the resultant Expr's type should be from the UID alone. We would need to gather the information from the actual function itself, somehow. Additionally, we might need heterogeneous lists for the inputs. However, having a heterogeneous list still wouldn't enforce that we provided enough inputs to the function. Additionally, having a list of UIDs and Exprs wouldn't ensure that the names provided are sufficient for the function.

Alternatively, I wonder if we should define all functions in Drasil as typeclasses and have them all export FCalls via instances of their respective typeclasses.

I think this FCall constructor is also subtly relevant to how we would design library interfaces for Drasil.

  -- | For multi-case expressions, each pair represents one case.
  Case     :: Completeness -> [(Expr Bool, Expr a)] -> Expr a

If it's a partial-case, what should happen with the Expr resultant type? Should it still be the type of the others despite being undefined? I imagine we wouldn't do many tests on the Expr Bools/conditions too.

  -- | Represents a matrix of expressions.
  Matrix   :: [[Expr a]] -> Expr [[a]]

Of course, we wouldn't be able to impose size constraints at the Haskell-type level. With length checks done at the smart constructor level, this might be okay. However, the length checks might be partial because Cs to matrices will not have that information readily available.

  -- | Unary operation for most functions (eg. sin, cos, log, etc.).
  UnaryOp       :: UFunc -> Expr -> Expr
  -- | Unary operation for @Bool -> Bool@ operations.
  UnaryOpB      :: UFuncB -> Expr -> Expr
  -- | Unary operation for @Vector -> Vector@ operations.
  UnaryOpVV     :: UFuncVV -> Expr -> Expr
  -- | Unary operation for @Vector -> Number@ operations.
  UnaryOpVN     :: UFuncVN -> Expr -> Expr

I think we can merge these into one big UnaryOp :: UnOp i t -> Expr i -> Expr t. However, the list-like types are still TBD.

  -- | Binary operator for arithmetic between expressions (fractional, power, and subtraction).
  ArithBinaryOp :: ArithBinOp -> Expr -> Expr -> Expr
  -- | Binary operator for boolean operators (implies, iff).
  BoolBinaryOp  :: BoolBinOp -> Expr -> Expr -> Expr
  -- | Binary operator for equality between expressions.
  EqBinaryOp    :: EqBinOp -> Expr -> Expr -> Expr
  -- | Binary operator for indexing two expressions.
  LABinaryOp    :: LABinOp -> Expr -> Expr -> Expr
  -- | Binary operator for ordering expressions (less than, greater than, etc.).
  OrdBinaryOp   :: OrdBinOp -> Expr -> Expr -> Expr
  -- | Binary operator for @Vector x Vector -> Vector@ operations (cross product).
  VVVBinaryOp   :: VVVBinOp -> Expr -> Expr -> Expr
  -- | Binary operator for @Vector x Vector -> Number@ operations (dot product).
  VVNBinaryOp   :: VVNBinOp -> Expr -> Expr -> Expr

Similar to the above, I think we can merge these into one big BinaryOp :: BinOp l r t -> Expr l -> Expr r -> Expr t. Again, the list-like types are still TBD.

  -- | Operators are generalized arithmetic operators over a 'DomainDesc'
  --   of an 'Expr'.  Could be called BigOp.
  --   ex: Summation is represented via 'Add' over a discrete domain.
  Operator :: AssocArithOper -> DomainDesc Expr Expr -> Expr -> Expr

This will share similarities to the AssocArith as above. However, I think we need to figure out a way to remove Continuous from the allowed DomainDescs here.

  -- | A different kind of 'IsIn'. A 'UID' is an element of an interval.
  RealI    :: Num a => UID -> RealInterval (Expr a) (Expr a) -> Expr Bool

Similar issue as with C.

Sep 21 '21 19:09 balacij

Perhaps RealI should be renamed to InRange?

Sep 21 '21 20:09 balacij

Sure, that's a better name.

Sep 22 '21 14:09 JacquesCarette

On to the real content of this:

C is indeed a huge problem. One potential solution is to have it take a Proxy argument that carries with it a type. [There are variants on that theme too.] Then we need proper 'smart' constructors, all of which are typed.
the same solution should apply to FCall. A variant would be to have the first argument of FCall be restricted to something that 'has' a type that we can extract, so that that can be used to get the resulting type. So our functions will have to typed.

Sep 22 '21 15:09 JacquesCarette

Sorry for the late reply. I'll need to spend a bit of time working with Proxy to understand how it will fix this, I'm not quite sure if I'm understanding it.

I will also try to post about the collection-like data structures soon. I'm not quite sure if I understand what the distinction is between a theoretical data structure and "code choice" for the theoretical data structure would be (thinking about how we use Doubles to represent Reals in Haskell but allow users to represent the Reals as Doubles or Floats in their generated software).

Sep 24 '21 02:09 balacij

I'm going to try to give a quick summary of the immediate Drasil-related problem at hand, in a point-form style, and then discuss some found solutions:

Problem

Writing our expressions comes with no guarantee that expressions are "well-typed", and, as a result, generated artifacts are not guaranteed to be immediately compilable.
We want our expressions to obey typing rules -- to ensure that the expressions are valid (and, hence, usable in code generation). Operationally, we already have a basic understanding of value "types" via Spaces.
- Relevant areas: Symbols (QuantityDicts), Quantity Definitions (QDefinitions), Expressions (CodeExpr, Expr, and ModelExpr), and the scientific value types (Spaces)
For code generation, typing rules are particularly important for CodeExpr and Expr.
- Requirements:
- Generated code should type check in their respective designated compilers.
- We need a means of reasoning about the types of expressions so that we can add casting where necessary.
- All expressions should be valid/obey the typing rules.

Solutions

Quick prototype of each option is available on a fresh repo, unrelated to Drasil. Regarding the "problem" itself, we can also expand it to the greater issue before researching it in the context of Drasil. Each below "Option" will relate to a prototyped solution in linked repo.

There are 2 "main" operational solutions, and picking between them, I believe, depends on the answer to: do we want to fully embrace Haskell as our host language for our "static type checking"?

No: "Manually" enforcing typing rules in usage and in creation.

If we don't, then we need to create runtime checks to ensure validity. We would also likely want to fully use our Space datatype to discuss the types of our expressions, as they would be related to the types of the symbols (the Space of the QuantityDict).

Now, we have another question: Do we ever want to be able to write invalid expressions, validating only as required?

Option 1.a: If we do, then we can effectively treat the constructors of the expression languages as being "fully exposed" in Haskell.

Option 1.b: If we don't, then we can use the smart constructors to enforce the typing rules.

Of course, with either option (1.a) or (1.b), typing validation will occur at runtime, meaning we will need to fully compile and tell Drasil to use it in code generation to determine if a written expression is valid or not. However, one of the larger benefits of this approach is that we can have custom type signatures for each term, across each language (CodeExpr/Expr/ModelExpr).

Yes: Piggybacking on Haskells type system.

Piggybacking on Haskells type system would likely mean opening up a type parameter in the *Expr GADTs, and letting Haskell naturally restrict terms allowed to only those reasonable.

This method is interesting because we would have duplicate type information in some areas: an 'Expr' type parameter and the related Spaces when it comes to symbols and quantity definitions.

In any case, we can prototype this, for which I have created at least 2 new prototypes (but there also exists others). One of the main issues with typing the expression language while leaning on Haskell is that we get a typing disconnect between symbols (QuantityDicts, with type information defined as a Space) and expressions (Expr) when forming quantity definitions (QDefinitions). Naturally, I think we would want the type information to be used here as well to also ensure that QDefinitions are also "well-typed" (i.e., the symbol type and the defining expression type are also matching).

To do that, we have at least two methods.

Option 2.a: Relying on calculating the Space of expressions and comparing to the Space of QuantityDicts. One of the most relevant issues to this is that the type parameter of an expression that contains a symbol can be misaligned with the type parameter of the expression formed (see example code).

Option 2.b: Relying on a type parameter in QuantityDicts (and, if desired, relying on a Proxy instance as well to carry type information). The same misalignment issue might occur as well. However, instead of occurring in the expression language, it will occur in instances of QuantityDicts (see example code). With this solution, we will most likely need to have #2873 (which is blocked by #2911) closed out completely before implementing this.

Yes and No!

A dual approach can also be created, but it would likely require at least 1 solution of 1.a and 1.b, and 1 solution of 2.a and 2.b. This might mean that we have 2 languages: one that allows an optionally well-typed and valid expression, and one for which all instances are guaranteed to be valid (approximately akin to the same difference as using a [] list or a NonEmpty list).

Jun 06 '22 18:06 balacij

I was most definitely thinking of doing method 2. I don't quite understand why QuantityDict is involved though...

Aug 25 '22 13:08 JacquesCarette

QuantityDicts are involved because of two main areas:

we use QuantityDicts to bring in variables into expressions, and
we use them in QDefinitions, where we relate variables to expressions.

We might be able to get away with ignoring static type checks when Exprs and QuantityDicts are related, but it might be stopping short of a full-measure to type safety when dealing with expressions. I don't know how this would impact typed GOOL, but it might make it a bit finnicky.

Aug 25 '22 19:08 balacij

Can you expand on your above comment? I've simply forgotten the details, so I can't recall what the first 'involved' in your reply actually means (same with the two occurrences of 'use' in the 2 points). Can you point me to concrete code please?

Aug 28 '22 07:08 JacquesCarette

Yes, of course! QuantityDicts are used as representations of "symbols"/"variables". They are, essentially, records containing a name, space, stage-dependent symbol, and maybe a unit. https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-lang/lib/Language/Drasil/Chunk/Quantity.hs#L22-L33

DefinedQuantityDicts are variants that are meant to be guaranteed to be 'defined' somewhere (however, this doesn't necessarily mean that QuantityDicts can't be defined somewhere too), and have a Sentence that discusses their definition: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-lang/lib/Language/Drasil/Chunk/DefinedQuantity.hs#L23-L32

I assume they are meant to be a "nested doll" of QDefinitions: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-lang/lib/Language/Drasil/Chunk/Eq.hs#L34-L35

We use QDefinitions to associate symbols with their defining expressions. Originally, QDefinitions only relied on QuantityDict, but in #2769, we switched to the DefinedQuanityDict.

Regarding (1) "we use QuantityDicts to bring in variables into expressions", I was referring to sy, which is by and large used for our *QuantityDicts: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-lang/lib/Language/Drasil/Expr/Class.hs#L219-L221

For example, tolLoad is a QuantityDict and used in an expression from a DataDefinition in GlassBR: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-example/glassbr/lib/Drasil/GlassBR/DataDefs.hs#L104-L106

In this example expression, we use symbols for a specific type of multiplication (for real numbers), so we need to add some assurance to the typing rules that the symbols (QuantityDicts) are also of the "right type." I believe the type of a sy (symbol reference in Expr) should be based on the QuantityDict it was grabbed from.

Regarding (2) "we use them in QDefinitions, where we relate variables to expressions", this is where we want to ensure that a symbol of a particular type is only allowed to be linked to expressions that evaluate to that same type.

For example, in this below code snippet we create a "Simple QDefinition" (QDefinition Expr), but there is no guarantee that the type of the variable and the type of the evaluated expression will be the same. https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-example/glassbr/lib/Drasil/GlassBR/DataDefs.hs#L52-L53

In both situations, we can break the compilability/reliability of the generated software artifacts if developers aren't very careful with writing out their QDefinitions. For example, Swift code needs us to manually cast numbers to the same type before performing arithmetic, unlike Java. So, if we're not careful with (1), then we can cause problems. Additionally, with any of statically-typed target languages, we would be allowing compilability errors if Drasil users ever wrote a poorly typed QDefinition.

So, while we want to add some type checking to Expr, we also need to use type information from QuantityDicts. Here is where the question goes to "do we want to piggyback on Haskells type system fully, partially, or minimally?" If we want to go fully, then we'd need to open type parameters in QuantityDicts, DefinedQuantityDicts, and Exprs at least, but also likely Spaces (for QuantityDicts to inherit their type parameter from). If we want partial, then we'll at least need to calculate the "Space" of expressions and, hence, the type checking will only be partially static. If we want minimally, then we'll be fully embracing "Space" and calculating the "Space" of expressions, and deferring all type checking to Drasils runtime.

Aug 30 '22 05:08 balacij

The description of the problem sounds good to me @balacij. I'm sure that @JacquesCarette will have more nuanced feedback/follow-up. :smile:

I really liked how you linked to the code defining QuantityDict etc. My first thought was that we should have that in our wiki. Turns out that we almost do have this when we define Chunks. I have two quick questions related to our Chunks wiki:

Should we use explicit links to the code, as you have done in the above comment? It seems to me that explicit links to the code would be more future proof than the copy and pasted code in the wiki.
Should we include a definition of QuantityDict in the wiki. The code for DefinedQuantityDict is explicitly given, but not the code for QuantityDict.

Aug 30 '22 15:08 smiths

do we want to piggyback on Haskells type system fully, partially, or minimally?

@balacij, could you please elaborate (as much detail as possible would be appreciated) on what it means to piggyback fully, partially, or minimally?

Aug 30 '22 16:08 peter-michalski

@smiths Thank you!

The current way appears to link to the source code through the Haddock documentation, which has a small "# Source" button the right-hand side of Haskell entities. Unfortunately, the GitHub Wiki Markdown renderer is not the same as the one used in these issues and discussions. Specifically, the wikis markdown renderer does not open up GitHub permalinks to source code as easily as we can here (where we just post the URL link).
I see what you mean. In the same text block, we should explain the difference between a QuantityDict and a DefinedQuantityDict better. The difference between the code is a bit shallow, so perhaps we can omit them? This might also mean we should merge them as Haskell data types too, moving the difference from "type name"-level to either a type parameter, or a field. Other than the difference of abstract/concrete, should the difference between abstract and concrete symbols be larger?

(2) is also interesting because it brings up a question: do we want abstract symbols (QuantityDicts, non-DefinedQuantityDict) to be in the global namespace of symbols? It makes sense that concrete symbols would be in the global namespace because they are the symbols we're most interested in. Would we be interested in having abstract symbols be localized to each individual theory?

@peter-michalski Yes, of course! The goal is type checking right now, and we need to find how we want to figure out how it will happen. Since Drasil is hosted by Haskell, we have Haskells ecosystem that we can leverage to build components of Drasil in. In particular, Haskells type system is a very powerful mechanism that we can leverage. By "piggybacking", I meant "how much do we want to defer type checking to Haskells type system." In other words, "should we push the task of type checking onto Haskells type system?"

If we want to "fully piggyback", then we can use GADTs and TTF with strict typing rules. In doing this, the type checking is done by GHC (and, optionally, in your editor via Haskell Language Server running GHC). So, type checking feedback is quick, but somewhat inflexible. Additionally, this would create duplication in the "type information" being in *QuantityDicts and in the type signature of *QuantityDicts.

If we want to go "minimally", then we just use Haskells type system for the *Expr types (ModelExpr, CodeExpr, and Expr) and everything related as we do now (ADTs, but in GADT syntax, and TTF). In the smart constructors for our *Exprs and QDefinitions, we perform manual type checking. This means it will be easier for us to have custom and different typing rules between our *Expr languages, and we'd have a means of discussing types in Drasil, through Space-types. It's also slightly easier to manipulate Spaces. However, since we're performing manual type checking, we're slightly more prone to errors, we have less type information in areas with pattern matching/unpacking, and type checking is deferred to Drasils runtime (so, this would only show when trying to run the individual examples, rather than compile them).

If we want to go "partially", then we have a sort of awkward hybrid between the systems, where we try to have *Expr typing rules enforced by Haskell, but have areas around it enforced by manual typing rules. So, we would have type checking done at different times, for different purposes. One of the possible issues with the "partial" solution is that it might conflict with the existing typed GOOL implementation, but I'm not sure how much it would conflict.

Aug 30 '22 22:08 balacij

We had a discussion about this in the meeting today as well. One thing I neglected to mention about the "full piggybacking" solution is that the solution relies on #2873, which is largely halted due to #2911. I will write a deeper explanation tonight, with code snippets, but I thought I'd mention it briefly ASAP.

EDIT: The larger explanation below:

Assuming we want to add type information to QuantityDicts, we'll run into issues almost immediately with our chunk maps: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-database/lib/Database/Drasil/ChunkDB.hs#L165-L183

In particular, we'll run into problems with scaling our SymbolMap against the types of symbols: https://github.com/JacquesCarette/Drasil/blob/21904c715546944d36e75a4ca91c710d92e98149/code/drasil-database/lib/Database/Drasil/ChunkDB.hs#L40-L42

Assuming we were to add a type parameter for QuantityDicts as well, we might end up with:

type SymbolMap t = UMap (QuantityDict t)

Here is where the ChunkDB scaling issue (and #2873) is relevant. For our primitive types, such as Reals and Integer, we can create a map for each one. However, with symbols that are arrays, or functions, scaling issues arise because we would need to add a new map for each new type of symbol we need.

Without facing the scaling issue, our ChunkDBs would approximately become:

 data ChunkDB = CDB {
   symbolTableInt :: SymbolMap Int,
   syTbRl :: SymbolMap Real,
   syTbIntRl :: SymbolMap (Int -> Real),
   ...

#2873 is currently paused until we resolve the issue with UID collisions.

Aug 31 '22 15:08 balacij

Ok, I've rebooted my brain on this issue. And I think the passage of time has helped.

Originally, the elegance of trying to piggyback on Haskell's type system really appealed to me. It worked so well in "finally tagless" after all! But it is really an over-constraint that just makes the problem harder, and possibly quite a bit more so in the future.

At the end of the day though, we do need to 'interface' with the typed GOOL prototype. So that part still has to happen. In other words, while we have the freedom to define our internal type system as we wish, it has to be 'translatable' to typed GOOL. Because we really do want stronger guarantees that we're generating well-typed code. That's really the ultimate goal. Achieving that will have side benefits for the well-formedness of everything else too.

So I'm currently thinking solution (1b) above for Expr and CodeExpr. I'm actually thinking (1a) for ModelExpr, since it's just for display, and it is the one place where it's ok to hack things.

The key stumbling block seems to be sy. Whatever representation of Expr we have, we need to be able to compute a type for it. (I'll use type here, but if we end up using Space, that's fine; if we end up defining something else, that's fine too). So trying to put in an "untyped" symbol into an expression should be a hard error.

The downside of (1b) is that this involves implementing a type system. Current "state of the practice" says it should be bidirectional. For ease of use, we'd want some polymorphism (so we don't need to annotate everything). Luckily, QuantityDict and friends already all have a Space.

Nevertheless, this might mean that our main representation of Expr might end up being a pair of an (untyped) "expression" and its associated type. Even more precisely: an expression, an environment (for all of the symbols used in the expression, that give a type to each symbol), and a resulting type. The "resulting type" might not be just a 'plain' type, but might involve some residual constraints, I'm not sure. [Think of polymorphic identity function, or polymorphic + or ...] It is best to think of a 'type' here as a "syntactic representation of a predicate that statically encodes our current knowledge about the expression under consideration".

Sep 26 '22 17:09 JacquesCarette

Thank you very much!

Regarding the "state of the practice" and "bidirectional typing", do you have any recommended reads regarding it? Is it something we should be interested in?

The idea you have at the end is interesting, I think we can do this too. I didn't think about this at all previously, however, so I might need to work on a prototype and get back to you.

I'm content with going down the route of (1a&b), respectively, for the expression languages. This should be a straight forward implementation, without relying on the upgraded ChunkDB prototype too.

Sep 27 '22 05:09 balacij

Re: bidirectional typing. It is covered in PLFA. This Haskell implementation might be an easier read. The informal explanation on SE is decent. You may also enjoy this tutorial.

We should just use this technology, we're not interested in it more deeply than that.

Sep 27 '22 11:09 JacquesCarette

While there are things we can clean up/continue (and I will create tickets about them), this is "complete" for now.

Nov 27 '22 06:11 balacij

Drasil Drasil copied to clipboard

Typing the Expression Language

Problem

Solutions

No: "Manually" enforcing typing rules in usage and in creation.

Yes: Piggybacking on Haskells type system.

Yes and No!

Drasil
Drasil copied to clipboard