Drasil `HasUID` + `HasSymbol` usage often relies on `QuantityDict`s having the same `UID` as the symbol of the things we're interested in

Relates to #2873 and #2896

Example: Sentences

Looking at how we bring the symbol of things that have symbols into the Sentence language: https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-lang/lib/Language/Drasil/Sentence.hs#L83-L86

We assume that the UID of the thing has the same UID of the "symbol".

An example of usage: https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-lang/lib/Language/Drasil/Document/Combinators.hs#L100-L102

Captured UIDs are then used for printing as such: https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-printers/lib/Language/Drasil/Printing/Import/Sentence.hs#L32

Note that lookupC relies on symbResolve: https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-printers/lib/Language/Drasil/Printing/Import/Helpers.hs#L55-L59 ( ASIDE: lookupC relies on Stage, which would be an example of printing in "context" in #2896 )

symbResolve is a function that looks for a QuantityDict (which isn't necessarily the same thing we originally started with! This causes problems in our lookups because it will use the "wrong" UID and expect the "wrong" type to grab a symbol from): https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-database/lib/Database/Drasil/ChunkDB.hs#L113-L115

Since our existing maps are separated, UID collisions aren't a problem (it seems we are using them to our advantage at the moment). However, if we ever change the UID of the original thing that had a symbol and don't similarly change the UID of the related QuantityDict, then we get a "Symbol not found in SymbolMap" error at runtime.

Additionally, this also means that the symbol function (typeclass function of HasSymbol) of the original things we wanted the symbol of aren't actually used. Instead, we are looking up and using the symbol of the related QuantityDict.

Another example: Exprs

Another similar example of the same expectation is how we bring things into the expression language: https://github.com/JacquesCarette/Drasil/blob/c00990e67f1072c0ae5b34847d0d8ff7c452af71/code/drasil-lang/lib/Language/Drasil/Expr/Class.hs#L584-L586

Questions

It seems QuantityDicts are approximately mathematical symbols. Do we always want them to be synonymous? If so, should the 2 above smart constructors accept "QuantityDict"s instead of any type satisfying those 2 constraints?
Since we need to change UIDs for #2873, and assuming we want to keep the UID references as they are, I don't think it would make sense for our types that create relationships between other chunks to have symbols via HasSymbols (unless of course they have a unique symbol; but even then, our usage above expects QuantityDict lookups, so it wouldn't work). Should we make "HasSymbol" be a typeclass that provides references to symbols instead of providing "Stage -> Symbol" functions?

Future

I think we can create "Typed UID References" to improve the "UID" references; in particular, this should be good for #2646.

Dec 01 '21 17:12 balacij

There is definitely a lot of confusion begin pointed out here. There is definitely 'disconnect' between the various pieces of code. So, starting from the top:

ch and its use of HasSymbol inSentence really was designed as the comments say. In theory, anything that 'has a symbol' should be printable, later.
yes, we definitely assume that the "thing" we're being passed is uniquely identified by its UID, that UID will not change, and the symbol that's reachable at the time of creation is the one we want/need.
that lookupC looks for QuantityDict is definitely odd. A QuantityDict is when we have some concept that represents some kind of Quantity (thus the name). These correspond to things we can store in variables in code. Thus, in common use, that works, even though it's not coherent with the previous 2 bullets.
however, I don't understand "symbResolve is a function that looks for a QuantityDict (which isn't necessarily the same thing we originally started with! This causes problems in our lookups because it will use the "wrong" UID and expect the "wrong" type to grab a symbol from)". Can you give me an example of when it isn't the same thing we start from? We don't do mutations, so how could this happen? I do understand that it's looking in the wrong place, in a way. But "wrong" UID?
"if we ever change the UID of the original thing that had a symbol " -- but we don't ever do that. Mutation is evil. Once we create something, that's what it is, forever and ever. Otherwise, we have a bug.

On to the questions:

Yes, things that contain a QuantityDict are approximately symbols (and variables). The problem is 'mathematical symbol' itself is not actually a well-defined notion! Should we change our ch constructor to be closer to how we use it? Most likely yes! That is definitely worth trying as a (stand-alone) experiment, to see what happens.
HasSymbol is a class that captures the very narrow idea of "having a symbol". It turns out that this idea is parametric, in the sense that the Stage context can change what symbol should be associated to the 'idea' encapsulated by the given chunk, whose UID we have. So I don't think we should change that.

As to the suggestion: yes, absolutely, marking each UID reference that we embed somewhere in a different data-structure with the 'type' at which it is valid, is a very good idea.

Dec 01 '21 21:12 JacquesCarette

"if we ever change the UID of the original thing that had a symbol " -- but we don't ever do that. Mutation is evil. Once we create something, that's what it is, forever and ever. Otherwise, we have a bug.

I don't think that I explained the issue well enough, I didn't mean to say that we mutate things. Rather, this also relates to https://github.com/JacquesCarette/Drasil/pull/2904#discussion_r760720237.

With #2873, we need to make our UIDs unique. As of right now, we have many UID collisions between our ChunkDB maps, and I'm working on altering them in some way (primarily where we can base UIDs off of smaller chunkers that they are made from -- essentially saying they are somehow a growth on the underlying chunks).

One specific example which helped me discover this (+ discussion):

As they are, DataDefinitons instantiate "HasSymbol" using the underlying QDefinition's HasSymbol instance, which relies on the QuantityDict that the QDefinition defines.

https://github.com/JacquesCarette/Drasil/blob/ad1e157f9ce96046e74787aaf31cb4e0726fe237/code/drasil-theory/lib/Theory/Drasil/DataDefinition.hs#L60-L61

We'll come back to this later (it's one of the important parts).

Now, DataDefinitions are extensions of QDefinitions, inheriting all of the underlying QDefinitions fields by default but the constructors often allow us to override terms, shortnames, etc.

https://github.com/JacquesCarette/Drasil/blob/ad1e157f9ce96046e74787aaf31cb4e0726fe237/code/drasil-theory/lib/Theory/Drasil/DataDefinition.hs#L94-L97

Since it's an extension of a QDefinition, it makes sense to systematically form a UID based on the underlying QDefinition (in the future, we might want to also show what things changed on inheritance).

data DDPkt = DDPkt {
  _pktUID :: UID,
  ...
}

data DataDefinition where
  DDE  :: SimpleQDef -> DDPkt -> DataDefinition
  DDME :: ModelQDef -> DDPkt -> DataDefinition

and in all of our constructors:

ddE :: SimpleQDef -> ... -> DataDefinition
ddE qd ... = DDE qd (DDPkt (qd ^.uid +++. "DD") ...)

NOTE: This is a temporary measure, it's basic and insufficient for the future, but it's a good temporary measure to allow us to at least add in ChunkDBs nicely.

If we compile and run, we get:

cd "build/glassbr" && stack exec -- "glassbr" 
glassbr: Symbol: minThickDD not found in SymbolMap
CallStack (from HasCallStack):
  error, called at lib/Database/Drasil/ChunkDB.hs:111:24 in drasil-database-0.1.1.0-A8sidZaCQurBJPtiv5WQIg:Database.Drasil.ChunkDB
make: *** [Makefile:257: glassbr_gen] Error 1

So, let's investigate which maps have minThick UID registered before my changes:

TermMap (IdeaDicts)
SymbolMap (QuantityDicts)
DataDefinitionMap (DataDefinitions)
LabelledContentMap (LabelledContents)

minThick is interesting; it appears under the same UID in 4 ChunkDB maps; all in different forms. With the new ChunkDB system, we will need to change all of them.

The IdeaDict Map is interesting, but I'm not quite sure why we need it. For now, it appears it's instantiated by creating a "IdeaDict" pushout of many chunks (Which chunks? I can't quite categorize which chunks because some are missing, so it's insufficient to say that it's "all chunks that might be an IdeaDict"). It would make more sense for things to point to the right maps and to have had maps for each type, as required. I'll need to come back to this later.

The QuantityDict (from the SymbolMap) that defines minThick is the "first" time we see "minThick", so I will call it the "original" chunk. The QDefinition defining it is an extension (and a separate idea, so I imagine we should change that UID and allow us to register it as well -- even though we dont yet register QDefinitions -- also, a QDefinition is no longer "original", it's a relationship of minThick and some defining expression, so merely appending "QD" to the UID of minThick is likely also insufficient long-term).

In the DataDefinitionMap, minThick appears as a DataDefinition with a QDefinition defining minThick. I think it would make sense to make this UID somehow show that we built this DataDefinition around the minThick QDefinition (excluded) with extra information (how we should do this is up for debate, but that's another discussion -- related ticket #2901).

In the LabelledContentMap, it appears as a LabelledContent ("Model Definition" content) for the minThick DataDefinition.

With my proposed changes, it's failing to find the UID of "minThickDD" in the symbol map -- where did we reference it?

The culprit; a Sentence: https://github.com/JacquesCarette/Drasil/blob/ad1e157f9ce96046e74787aaf31cb4e0726fe237/code/drasil-example/glassbr/lib/Drasil/GlassBR/DataDefs.hs#L283

It makes sense that we have a Sentence combinator that has Chunks that define things as an input and writes "defined in", such as that definedIn' function, defined below: https://github.com/JacquesCarette/Drasil/blob/ad1e157f9ce96046e74787aaf31cb4e0726fe237/code/drasil-lang/lib/Language/Drasil/Document/Combinators.hs#L104-L106

And hFromt is DataDefinition that has the QDefinition which defines minThick (meaning both of their UIDs are also minThick): https://github.com/JacquesCarette/Drasil/blob/ad1e157f9ce96046e74787aaf31cb4e0726fe237/code/drasil-example/glassbr/lib/Drasil/GlassBR/DataDefs.hs#L72-L76

Finally, with my change, it fails to find the minThickDD because the constructors noted above (e.g., ch, and sy) work by relying on the inputs having a symbol (even though it never uses the DataDefinition's real symbol! it looks up an equivalently UIDed QuantityDict), and having the symbol (a QuantityDict) share the same UID as the input.

DataDefinitions in particular have an instance for HasSymbol.

Should DataDefinitions even have an instance for HasSymbol, and should we really think of QuantityDicts as equivalent to "Symbols"?

If yes to the former, then the ch and sy functions grab the "wrong" UID (they use the UID of the DataDefinition, while they should be using the UID of the QuantityDict that the DataDefinition is focused on). However, we might even consider alternative designs where we call the HasSymbol function directly instead of relying on separate "Symbol Chunks" (e.g., avoiding ultimately relying on QuantityDicts to supply symbols).
If it shouldn't, then we might want to do something along the liness of having sy and ch expect QuantityDicts strictly (as opposed to anything that has a symbol), and possibly removing the HasSymbol typeclass entirely.

Personally, I see the most straight-forward solution is to have HasSymbol provide symbols itself (which may or may not use another chunk's symbol in implementation -- that doesnt matter much), not have Symbols be treated as equivalent to QuantityDicts (e.g., allowing things to provide Symbols without reference to a QuantityDict), and have typed UID references so that we know to look for a DataDefinition instead of a QuantityDict here (note: this would only be possible under #2873, ~ a cyclic dependency?)

Dec 03 '21 04:12 balacij

however, I don't understand "symbResolve is a function that looks for a QuantityDict (which isn't necessarily the same thing we originally started with! This causes problems in our lookups because it will use the "wrong" UID and expect the "wrong" type to grab a symbol from)". Can you give me an example of when it isn't the same thing we start from? We don't do mutations, so how could this happen? I do understand that it's looking in the wrong place, in a way. But "wrong" UID?

I had to respond to this second because I wanted to provide the example above.

Here, since definedIn' uses ch, and we call it on the minThick-related DataDefinition (hFromt), but we are using an untyped UID which we assume will always be contained in the SymbolMap (e.g., a QuantityDict) at runtime. In a sense, we are using the "wrong" UID because it should have used the symbol from minThick (e.g., UID = minThick) instead of minThickDD. In another sense (probably the better one), it should be more flexible through typed UIDs, which would allow us to grab the UID from the ChunkDB and get it as a DataDefinition, so that we can call symbol on the DataDefinition. This "better" one would likely only be possible when we have #2873 done (or perhaps at the same time).

Dec 03 '21 04:12 balacij

Great explanation. And explaining the current design is going to take more than just a few minutes, so I'll come back to this later.

But quick idea: think of it like nested Russian Dolls, where they are all part of the same thing, thus have a single name as a 'conglomerate entity', but the inner pieces can still be seen. But their only name is the name of the assembly.

So we don't have 4 minThick rather minThick is made of 4 nested pieces (though the labelled content puzzles me a bit -- I'd like to see its definition too). The point is that the outermost minThink should contain within itself all of the next one (the DD, I guess?) which in turn contains all of the QuantityDict which in turn contains all of the IdeaDict.

Dec 03 '21 21:12 JacquesCarette

The more I think about this, but more I feel like going to structured UIDs is going to help.

The point is that there is a single concept being captured, but that concept is made of pieces. The pieces don't necessarily have an independent life. So we need to be able to 'say' that.

Dec 08 '21 21:12 JacquesCarette