Drasil
Drasil copied to clipboard
Understanding Drasil's theory through observing its codebase
I've been running into a few problems with dependencies (and I've also caused one... see: drasil-code-base
) because I'm unsure of where code (primarily regarding 'printing') should be placed. I've also never seen any references (in the code) of the SmithEtAl
template we base our generated artifacts on. Additionally, with #2873 in mind, I felt that drasil-database
, as a package, was a bit peculiar because it contained SystemInformation
in the same area as the ChunkDB
(which, in my opinion, shouldn't have any dependencies, nor be related to any chunks [arguably, other than itself]). Finally, our package READMEs and descriptions are a bit confusing to me, they don't really describe the package dependencies, but the main Drasil.md
file makes sense, but appears to be outdated.
As such, with these things in mind, I am going to try to understand our packages, and how it relates to the foundational theory behind Drasil.
Shallow Analysis
First, let us start off by naively observing and analyzing the drasil-*
packages:
-
drasil-build
:- Contains an encoding for the
Makefile
language, and a printer for the AST toDoc
(pretty
) - Defines a
Makefile
AST, smart constructors for building up aMakefile
, and a printer for the AST to be rewritten as aDoc
(pretty). - No dependencies!
- Contains an encoding for the
-
drasil-code
:- Currently primarily contains functionality for tieing some things together with GOOL &
drasil-buld
. - Contains a few printers for converting DataDefinitions and QDefinitions into specialized
drasil-code
data types. - We should see if we can remove the dependency on
drasil-lang
,drasil-printers
, anddrasil-theory
.- ... and if we can remove
drasil-code-base
entirely.
- ... and if we can remove
- We should work to clean up the
package.yaml
file. It currently contains a manually written list of module files, caused by at least 1 HS file going unused. - Dependencies:
- drasil-build
- drasil-code-base
- drasil-database
- drasil-gool
- drasil-lang
- drasil-printers
- drasil-theory
- drasil-utils
- Currently primarily contains functionality for tieing some things together with GOOL &
-
drasil-code-base
:- Contains the definition for CodeExpr, and a few other things. It came to be as a half-measure to avoid a cyclical dependency between
drasil-printers
anddrasil-code
. - Contains a printer for converting Expr into CodeExpr.
- We should work to see if we can remove it by understanding why
drasil-printers
relies on it, and whydrasil-code
relies ondrasil-printers
. - Dependencies:
- drasil-database
- drasil-lang
- drasil-utils
- Contains the definition for CodeExpr, and a few other things. It came to be as a half-measure to avoid a cyclical dependency between
-
drasil-data
:- Strictly contains instances of chunks.
- We should work to clean up the
package.yaml
file. It currently contains a manually written list of module files. - Worth considering splitting up into more packages
drasil-data-physics
,drasil-data-mathematics
, etc. - These packages should primarily contain theoretical models that aren't directly usable in systems, and various expressions and derivations. They would only be usable in systems if variables are specialized.
- Their variables used are rather "generic", and unintrusive to systems that import it.
- Dependencies:
- drasil-lang
- drasil-metadata
- drasil-theory
- drasil-utils
-
drasil-database
:- This package contains the definitions for
ChunkDB
andSystemInformation
, and has a few 'helper' functions for working with theChunkDB
. - Dependencies:
- drasil-lang
- drasil-theory
- This package contains the definitions for
-
drasil-docLang
:- This package primarily contains functionality for a generic notebook language, but with special components for the "SmithEtAl" template. Specifically, it contains an AST for the "SmithEtAl" document template, and functionality for analyzing and forming it. It is a middleman between our Chunks and HTML/JupyterNotebooks/LaTeX.
- If we move the
SystemInformation
out ofdrasil-database
, I think it would be good to also move the code fromdrasil-docLang
alongside it because the code is highly-coupled to the "SmithEtAl" template. This isn't to say that it shouldn't be exposed however, I think it should be exposed so that other printers/template engines can also base theirs off of Dr. Smith's template (potentially the updated variant that Dr. Smith mentioned on Monday's discussion). - This might mean we form a
drasil-printers-smith-et-al
package? - Dependencies:
- drasil-lang
- drasil-data
- drasil-database
- drasil-printers
- drasil-theory
- drasil-utils
-
drasil-example
:- Just a folder carrying the examples.
- I think it would be appropriate to rename it to just
examples
just to move it away from the "fundamentals"/drasil-*
namespace. - However, let's not spend much time thinking about this (yet?).
-
drasil-gen
:- Extra functionality for tieing together "SmithEtAl" template + code, with a focus on designating which artifacts should be built.
- This could also be moved in together with a potential
drasil-printers-smith-et-al
package since it's highly coupled together with them. - Dependencies:
- drasil-lang
- drasil-gool
- drasil-build
- drasil-code
- drasil-printers
- drasil-docLang
- drasil-database
-
drasil-gool
:- GOOL encoding + printer for GOOL to languages (Java/Python/C#/C++/Swift) + printer for languages to
Doc
. - It only relies on
drasil-utils
, and primarily for textual needs and list 'helper' functions. However, sincedrasil-utils
relies ondrasil-lang
, this package also only builds after drasil-lang, when it really shouldn't be impacted bydrasil-lang
's priority in the GHC construction plan. - Dependencies:
- drasil-utils
- GOOL encoding + printer for GOOL to languages (Java/Python/C#/C++/Swift) + printer for languages to
-
drasil-lang
:- Contains encodings for:
- Mathematical languages (which can become 'chunks' through containers, but are currently encodings): Expr/ModelExpr/Literals
- Mathematical constructs: QuantityDict, QDefinition, RelationConcept, Uncertainty, ConstrConcept, ConstraintedChunk, etc
- Symbols
- Derivations (which I don't want to put next to mathematical languages because I think it can be made polymorphic)
- Components of a Notebook/Document language: Partition, SecCons, Section, SecHeader, Content, Document, Notebook, TableOfContents, ListType, ItemType, etc (there are plenty, but I dont think they are all worth mentioning unless we are specifically investigating them)
- Natural language: Sentence, NounPhrase, SentenceStyle, etc (again, there are plenty, but not all worth noting)
- People
- References & Citations
- URIs (URI, Scheme, Authority, Port)
- It may be too large. It might be worth breaking this up into a few packages. In particular, I think we should have a package for the natural language, mathematics (and constructions), and documents. Of course, we can further decompose as we see fit/necessary.
- No dependencies!
- I would call
drasil-lang
the "root"/"base" package in Drasil since most other packages import it.
- Contains encodings for:
-
drasil-metadata
:- This is a very interesting package, but, at the moment, it contains very little files & information.
- I'm not quite sure of what we define as "metadata" in Drasil, but I wonder if we should be moving more things into it?
- Dependencies:
- drasil-lang
-
drasil-printers
:- Contains a "General Science Printing" AST, with printers for: HTML, DOT files (I'm not entirely sure of what these are), and LaTeX, with incomplete encodings for Markdown, JSON
- Contains multiple printers for
drasil-lang
-things into it's own "General Science Printing" AST. - A general printer for Expr/ModelExpr/CodeExpr/Literals, Symbols, Sentences, etc into the HTML/LaTeX/Plain text/etc
- Dependencies:
- drasil-data
- drasil-code-base
- drasil-database
- drasil-lang
- drasil-theory
- drasil-utils
-
drasil-theory
:- Contains encodings for InstanceModels, DataDefinitions, TheoryModels, GenDefns, ModelKinds, ConstraintSets, and MultiDefns
- Contains CIs (CommonIdeas) for IMs, DDs, TMs, & GDs
- Dependencies:
- drasil-lang
- drasil-metadata
-
drasil-utils
:- Contains "utility" functions for other packages to use.
- Currently contains a lot of constructors for Sentences, NamedChunks, NPs, Contents, and other data types local to
drasil-lang
. - Since many packages depend on
drasil-utils
, they also, by extension, have a potentially unused dependency ondrasil-lang
. - I think these constructors should be pushed into
drasil-lang
's source files because not all packages needdrasil-lang
files to be compiled before them. - The end result would be a "utils" package which supplements Haskell's
base
package rather than anydrasil-*
package. - Dependencies:
- drasil-lang
-
drasil-website
:- Builds the website.
- Currently relies on
SystemInformation
, but it contains no models, inputs/outputs, math, and isn't intended to generate any SRS or code. TheSystemInformation
seems inappropriate to be used (hence the empty lists). This is also likely evidence for a need to splitSystemInformation
into different variants.
Slightly deeper, but still fairly shallow, observations, and discussion:
With a focus on observing the packages:
-
We have many "encodings" for things (ASTs, chunks, etc), and "printers" that either print "encodings" into other "encodings" or directly into artifacts (primarily,
Doc
s at the moment):- Notable examples:
-
Self-contained/base-level: Both
drasil-build
anddrasil-gool
contain no dependencies, but contain encodings, an AST, and a printer for their ASTs (intoDoc
s). I would call them "near base-level" because, in the most obvious sense, they describe and produce end-user software artifacts. Of course, "base and higher" have a very different meaning when looking at encodings (they might mean different things, it's likely better to have relative terminology in the future). In some sense, other encodings might also be the target "end-user" artifacts, and we might call them to be "higher" thandrasil-build
anddrasil-gool
, so they can be thought of as both "high" and "low", it just depends on your scope because there might be encodings that sit above them too. -
Compose others and prints external encodings into itself/higher-level?:
drasil-code
&drasil-code-base
contain their own ASTs and encodings for various information related to code generation. They neatly tie together other ASTs and encodings from drasil-build
,gool
,lang
, andtheory
as a part of another, larger, cohesive "printer" geared towards "generating software artifacts". It is an intermediary betweenlang
&theory
andgool
&build
with a larger goal of generating "distributable software". It's at a "higher level" thandrasil-gool
anddrasil-build
because it isn't intended to generate artifacts on it's own, but through their artifact generators. -
Self-contained but can print external encodings into itself/?:
drasil-printers
contains both a general "science/math document language", and multiple printers for printing various data encodings fromdrasil-lang
into it's document language.drasil-docLang
is another example of this, where it contains another layer of knowledge above the general "science/math document language" but with a specific ordering (currently, seemingly coupled to the "SmithEtAl" template).
-
Self-contained/base-level: Both
- Since our development cycle starts off with some vague goal that we decompose into it's pieces, I think it makes sense that we should consider how and where our encodings and printers should be written.
- So, I'd like to ask: where should we define how encodings get printed into other (likely lower-level) encodings?
- With our self-contained examples above, it makes sense that they have their language encodings and a printer that converts their language encodings into raw artifacts (often,
pretty
'sDoc
s [more on this later]). In other words, a (while still low) higher level encoding dictates how you can 'push out'/render it into a lower-level encoding. - With composed printers, it makes sense again to have a higher-level encoding that composes other lower-level encodings together via it's printer.
- The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
- We currently do the former, but I think we should be following the latter, because:
- The lower-levels of encodings are the most "stable" (they are the most basic units, with the least data density [they're primarily just strings of words, in some form or another]). With the "bottom-up" approach we follow, we would continuously remove from hard-coded data by abstracting over the details of like-data through creating re-usable higher-level encodings (to be specific, with the desire of declaratively generating the lower-level bits through printing the higher-level instances). As such, I believe it makes sense to treat them as expectations and properties of the higher-level encodings, which we can also better see when placed next to the higher-level encodings.
- Now, obviously, I'm not saying that an Expr should dictate how it should be translated into HTML, but I do think that it should dictate how it should be translated into it's greatest-lower-bound (e.g., the mathematical printing language) w.r.t the intent of printing (e.g., it's GLB could be CodeExpr in the context of generating code). Then, depending on the use case, the mathematical printing language should dictate how it's being laid out into LaTeX, HTML [realistically, it's not laid out into HTML by itself; we primarily use MathJaX, so it's still LaTeX], or into the 'plain' mathematical print.
- Less importantly,
- Additionally, I think it will make Drasil-in-Drasil easier because we will have a better understanding of the translations of
A
encoding toB
encoding, by understanding it as a property ofA
s encoding [aside: it can be thought of as "a component of a kind of more general version of ModelKinds"]. In other words, this will become declared as a property, which we should be able to encode later. - With the latter option, we should be able to remove the majority of our
.Development
modules because we would be inverting many of our dependencies. - We should be able to remove
drasil-code-base
entirely, by merging it's contents back intodrasil-code
, cleaning up dependencies in general, and making "finding printers" generally easier.
- Additionally, I think it will make Drasil-in-Drasil easier because we will have a better understanding of the translations of
- Concretely, I think we should create a series of typeclasses for each encoding (e.g., "class CanProduceLowerEncoding t where toLowerEncoding :: t -> LowerEncoding", which the higher-level encoding which would use as an interface to describe how the "pushout" would occur (e.g.,
instance CanProduceLowerEncoding HigherLevelEncoding where toLowerEncoding = ...
). We can also add extra parameters for these printers for each different "configuration" we want to see from printers. This might also assist in Dong's current work with the rendering styles for Linear DE Models.
- With our self-contained examples above, it makes sense that they have their language encodings and a printer that converts their language encodings into raw artifacts (often,
- Notable examples:
-
drasil-database
:- With #2873,
ChunkDB
and maybe a few other chunks (realistically, I can only think ofUID
s fromdrasil-lang
, so it might be singular) become a self-contained unit, and they are a rather fundamental component to collecting knowledge/chunks. In which case, the dependencies fordrasil-database
are all forSystemInformation
. I wonder if it's appropriate to moveChunkDB
into a newdrasil-core
package on it's own (realistically,ChunkDB
s are a fundamental component for all drasil "systems" & examples because they are where knowledge is collected for the top-most-level printer to use.). Alternatively, I wonder ifSystemInformation
should exist at all with the new functionality that the newChunkDB
s could provide, or ifSystemInformation
should be moved to another package...- REDUNDANT DISCUSSION: Assuming we were to move
ChunkDB
&UID
s into a newdrasil-core
package, this new package would contain strictly the fundamentals for "knowledge management". It's not necessarily fundamental to the theory behind Drasil, but it is an important component, nevertheless, in practice. This would leaveSystemInformation
as the only construction left inside ofdrasil-database
...
- REDUNDANT DISCUSSION: Assuming we were to move
- With #2873,
-
drasil-database
'sSystemInformation
&drasil-docLang
are both seemingly connected by a common denominator; the code and the SRS documents (the template):- It appears that the
SystemInformation
is the top-most-level encoding for theSmithEtAl
template printing. It is used to print out an SRS, and to print out/generate code. - Additionally,
drasil-docLang
contains a document language and a lot of components that are highly coupled with theSmithEtAl
template. I wonder if we should be makingdrasil-docLang
a slightly simpler document language in favour of moving the parts that are more coupled with theSmithEtAl
template to a newdrasil-smithEtAl
package. This would potentially allow us to create other flavours of the document, or Dr. Smith's latest variant that he mentioned in our last meeting. The very nice functions used to build up anSI
(SystemInformation
) could also be used to restrict allowed "Chunks" into a "system" (undefined).- This would solve (2).
- In either case, then
drasil-lang
should be replaced as the "root" package by the either the new, potential, "drasil-core" package, or the slimmeddrasil-database
package.
- It appears that the
-
drasil-printers
:- We have many encodings and printers:
- DOT: contains a DOT file encoding + various printers specifically intended for usage in the SRS
- Printing: A general printing language for printing mathematical expressions and sentences + various printers for generally mathematics-related chunks from
drasil-lang
into it. - HTML: contains an HTML (w/out CSS) encoding + various printers, again, specifically intended for usage in the SRS
- JSON: missing a JSON encoding, but contains bits and pieces of a Markdown and HTML encoding + printer. I think this is an active work-in-progress, so let's not spend much time on this right now.
- Log: contains "dumping" mechanisms for dumping various chunk maps into
Doc
s. I think this should be moved intodrasil-database
, alongsideChunkDB
s [this was actually a part of my intended design forChunkDB
s] because it should be completely chunk-agnostic. - Markdown: missing a Markdown encoding, but contains raw Markdown for generating the existing READMEs used for our generated README.md software artifacts next to some generated code.
- Plain: A 'plain' printer for various data types. I believe this primarily sees usage in normalizing symbol names, expressions and such related for usage in code generation.
- TeX: Printing methodology for the printing language into TeX/LaTeX.
- It would be good to make internal encodings for JSON/HTML/CSS/Markdown so that we can lay other things into them as well instead of having to manually write these printers for other languages as well. Additionally, in general, we should be following the same guidelines as discussed above in (1).
- We have many encodings and printers:
-
drasil-lang
:- ...is mostly the "root" package (directly or indirectly) for most packages.
- It is always the first package to be built when building Drasil's entire codebase.
- With #2873,
drasil-database
, I believe, will (and should) replacedrasil-lang
as the root package, assuming we moveUID
fromdrasil-lang
intodrasil-database
(assuming we choose to keepChunkDB
inside ofdrasil-database
).
-
The "SmithEtAl" template:
- Components are currently scattered across:
a.
drasil-docLang
, in the form of special attention to the sections of the "SmithEtAl" template (also, deals withSystemInformation
). Since I'm primarily thinking of the "SmithEtAl" template as a template for "software requirements of scientific software" and I haven't had much exposure to too many other templates, I might be extending my own definition a bit too far, but there are still specific hard-coded components for general SRS documents, and the format we adhere to. I might be wrong, but I think we can still decompose further, to further add abstraction/ambiguity to the relationship or to allow for different printer configurations. However, the fact that we don't have any sort of "main entry point"/module/subpackage/package for containing the "SmithEtAl"-related code, but we generate SRS documents adhering to the template should indicate a possible coupling issue. b.drasil-printers
, in the form of all of the printers containing code which is specific to SRS documents (and, since we realistically only generate "SmithEtAl" templated documents, it's likely primarily for the template) c.drasil-code
, in the form of "composing printers" indrasil-printers
withdrasil-gool
anddrasil-build
d.drasil-database
, in the form ofSystemInformation
- It might be beneficial to try to unify all of these components above into 1 single package to start. Afterwards, we should try to decompose it into smaller packages. As of right now, it seems there is high coupling. Through this, we should be able to generate different variations of the template and other artifacts.
- Components are currently scattered across:
a.
Again, slightly deeper observations
-
At a sky-high level, everything is an encoding of either data/phenomena, or a translation of knowledge in encodings (often one-way -- "printing").
- Somewhere a bit lower, we can try to categorize the components of our packages into one of:
a. Encodings that translate things into phenomena (e.g., "end-user" artifacts) <- "low" knowledge density due to it being an abstraction of "phenomena"
b. Encodings that translate/compose some group of encodings into other encodings <- "medium" knowledge density (realistically, a relative, or even false, sense of depth)
c. Data encodings of encodings (including encodings of maps of encodings [e.g.,
ChunkDB
]) <- "high" knowledge density (again, realistically, a relative, or even false, sense of depth). This would also contain properties of "pushouts" ("lower" encodings) as "views" of the higher encodings. - We should try to define the code in each package as belonging to one of these 3 types. This, I believe, would show stricter adherence to the foundational theory of "well-understood" domains of knowledge & Drasil in general.
- Somewhere a bit lower, we can try to categorize the components of our packages into one of:
a. Encodings that translate things into phenomena (e.g., "end-user" artifacts) <- "low" knowledge density due to it being an abstraction of "phenomena"
b. Encodings that translate/compose some group of encodings into other encodings <- "medium" knowledge density (realistically, a relative, or even false, sense of depth)
c. Data encodings of encodings (including encodings of maps of encodings [e.g.,
-
Pretty's
Doc
is still a phenomena to Drasil. The same goes for all imported libraries we use. It might be difficult, but we should consider not having any imports, but building all things from scratch (this effort would certainly not go to waste, because there will surely be a domain where these encodings are a part of the domain, and we shouldn't be constrained by using other libraries which we might not be able to edit easily). Afterwards, the final actions of the impure "IO"-related things will be the final phenomena (which I'm unsure of how we can sufficiently teach Drasil). Finally, through this, we will have a better understanding of how Drasil will need to, eventually, describe Drasil.
Final Observations
- A "system" has a different meaning with respect to each intended usage of a knowledge base. I think we shouldn't describe a "system" ourselves, but instead, we should consider letting the "printers" describe it, themselves, through their requirements. This is because we can think of each printer as a system unto itself (a cohesive network of knowledge). Note: we might only really care for the "larger" systems that "do something in the end" (often, these will take in a
ChunkDB
/knowledge-base and do something with that), but I don't think that should disambiguate or diminish the openness of what a "system" is, as defined in common dictionaries.- Taking the "SmithEtAl" "system" as an example using the
SystemInformation
as the base entry point to the template, the requirements would be as follows: a. Knowledge-base must contain a list of authors b. Knowledge-base must contain a purpose c. Knowledge-base must have Input variables and Output variables (instead of directly placing these as QuantityDicts, we should place these wrapped in their own Input and Output data wrappers so that we can pull them directly from theChunkDB
) d. Knowledge-base must contain output constraints (these are very nice! - These requirements would be imposed by checking that admissible "knowledge"/chunks are registered inside of the carrier "KnowledgeBase"/
ChunkDB
. Then, we can have "systems" that impose "this knowledgebase should only include 1 X, 2 Ys, any amount of Zs, no As, etc". Of course, this is heavily reliant on the proposedChunkDB
design I proposed in #2873. - If we want to restrict only to the larger systems that act on
ChunkDB
s, we can do that too. This would be a stricter definition, where requirements are imposed by gathering "the correct types" from the system (in other words, we would be bunching up our knowledge/chunks by their type representations [TypeRep
s fromData.Typeable
] and imposing restrictions based on those found in theChunkDB
). Then, the "process" component of the System interface ~~class System where process :: ChunkDB -> IO ()
(this would probably be different, but it should paint the right picture) should be fairly straightforward. - Classification of systems can potentially be gathered through creating generic requirements applicable to each system that they all must contain. They might become more specific versions of classifications through having extra requirements.
- Taking the "SmithEtAl" "system" as an example using the
- Assuming point (1) of "Again, slightly deeper observations" is "good", then we should consider building graphs of our knowledge encodings and the translation paths automatically (through somehow creating encoding printers). This should help one with understanding the infrastructure; it would make understanding why many of Drasil's encodings (DSLs and all) exist, as part of the larger Drasil "system."
- Compared to other compilers, our "intermediate representation" of knowledge allows us to get a lot more out of the lower-level compilers, and, with enough effort, to completely supersede them. Ultimately, it seems we get a lot more, for a lot less work, than standard software development/usage.
Thank you for reading :smile:! Hopefully, this all makes sense.
Impressive work @balacij. Your observations seem on point to me, but @JacquesCarette is a better judge of the future direction of the design of Drasil. We should make a point of using the knowledge in this issue. In particular, the shallow analysis seems like a summary that should find its way (in some form) into one of our Wikis.
Thank you, @smiths! :smile: Hopefully so, I think some of the "shallow analysis" could also go into the main package.yaml
files and the README.md files too.
Huge amount of information here. And lots of good questions. So, to be able to eventually close this issue, I'm going to spin off a bunch of issues, each of which is related to what's here, but contains more 'actionable' material. When it is more 'purely informational', I'll make comments here. Eventually, we may want to extract the knowledge from here and put it on the wiki and/or in the READMEs.
Thank you, that sounds like a great idea!
On drasil-data
: it is a "database", done as a set of Haskell files that contain only declarations of 'chunks'. The 'chunk' part is not so important, the important part is that it uses a host of different encoding data-structures.
Furthermore, it spans from simple knowledge to rather complex knowledge (i.e. theories), with everything in between. There is also a cross-cutting arrangement where the 'knowledge' encoded comes from many different application domains.
Right now, we don't know how to organize this. For sure, internally to drasil-data
, it's partly organized, partly a mess. To detangle things, we should understand our own knowledge encodings well enough to understand exactly what kinds of "level mismatches" we've created. We also need to understand what classification seems to be the most natural to use -- level? application domain? both? neither?
In other words, I don't think we're even close to ready to do something sensible with it.
where should we define how encodings get printed into other?
An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.
The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in drasil-printers
is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
On drasil-database
and SystemInformation
: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.
For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed.
Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.
Our current design is instead to create one monster representation, which we could call KitchenSink
instead of SystemInformation
, which has everything any part could possibly want. So our current process is thus
- collect all the information
- stick it all in one place
- pass it to everyone That worked for a while, but is now fraying. All 3 pieces suffer, in different ways, from this monolithic design. In particular, it is hard to have automation that derives new information from old.
Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want.
That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for drasil-website
without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use the KitchenSink
approach.
There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.
Re: drasil-database
:
In other words, I don't think we're even close to ready to do something sensible with it.
Sounds good, I can see why. Hopefully it will become more clear later on.
An excellent question indeed! What you are witnessing here is a very long evolution, much of it in parallel, of various pieces. And you have keenly observed is that there's almost a pattern to it all, with emphasis to almost. Better yet, you've distilled some potential arrangements, and given your opinion about pros and cons of them.
Thanks!
The question is regarding placement. Should the lower-level encodings contain information about how higher-level encodings can be rendered into lower-level encodings, or should higher-level encodings contain a property that dictates how they can be "viewed"/printed as lower-level encodings?
That is indeed an excellent question. I don't fully see examples of the lower-level dictating to the upper ones how to do their jobs (specific examples would be great), but your point stands: the XML+CSS/XSL design is one that has proven its worth. To spell it out: it is good to have a 'semantic' data encoding (XML) that is coupled with instructions on how to render. These instructions can have all sorts of options, which in turn are all built on top of a single rendering language. In theory, the language in
drasil-printers
is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
Thanks. The lower-level encodings don't really dictate the conversion "well"/uniformly because they write the instructions nearby under nearby "floating" functions (it's not as "uniform" of a pattern as much as a function belonging to a typeclass).
A good example can be drawn from drasil-printer
's Language.Drasil.Printing.Import.*
.
Specifically, we can see .Space
:
https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Space.hs#L14-L18
Unit symbols .Symbols
:
https://github.com/JacquesCarette/Drasil/blob/a1c22b739c958ae169e8fe4fb2ea2b0a670dc7df/code/drasil-printers/lib/Language/Drasil/Printing/Import/Symbol.hs#L38-L43
These both are floating, but if we had a typeclass:
class CanGenMathExprs t where
toMathExpr :: t -> Printing.Expr
we would just use a common toMathExpr
for any applicable type for which it's defined. Alternatively, I guess we can try to parameterize further with an output variable, and another for "named" typeclass instances;
class CanGen i o ctx where
ctxPrint :: i -> o
instance CanGen Math.Symbol P.Expr 'SomeCtx where ctxPrint = _
I think we would just need type applications to access the ones for a particular 'context'. I'm not too sure how helpful this variant would be, I thought it might be an interesting way to get different 'printing' styles. Embedded 'printable' things in other 'printable' things means that the single type parameter used would need to be something each embedded 'printable' type would need to be defined for, and it would also be implying that a single type would need to carry enough information for the existing "PrintingInformation" for all embedded types. To overcome the ChunkDB
part, we would just add it as a parameter to ctxPrint
, but the other components might get messy for various combinations, or when PrintingConfiguration's size increases for extra configuration options. This would be a potential option for @cd155 primarily for the different styles in printing ODEs, I think, but I'm completely unsure if it's a good idea or not, it will require a bit more investigation. The interesting thing about SomeCtx
is that it could be a whole "style" of printing things in a layout (e.g., SRS variants, etc, or subvariants of certain SRS variants, etc).
In theory, the language in drasil-printers is meant to be an "abstract rendering language" which can then be 'lowered' further to HTML, LaTeX, etc.
Yes, taking this example, drasil-printers
would become a package with, strictly, it's own encoding of it's "abstract rendering language" and an "instance" of some typeclass, as above, which "lowers" it into HTML/LaTeX/etc. The dependencies of drasil-printers
would minimize to just the packages for the HTML/LaTeX/etc
-generation, while drasil-lang
would gain a dependency for drasil-printers
as it would also need to "instantiate" a typeclass for lowering things into the abstract rendering language of drasil-printers
.
On
drasil-database
andSystemInformation
: the fundamental problem here is that we don't have a good explanation of what these ought to be. And certainly they were created before our current understanding of many things.
Would you say we have a good explanation of why ChunkDB
exists? To my knowledge, I understood that within drasil-database
, it was only SystemInformation
that we didn't have a good definition for (this was one of the specific examples in #2195). With my version of ChunkDB
(from #2873) and UID
s, it would make sense, to me, that they would be a core bundle that would be the "least"-required components for Drasil to be used (e.g., all components would rely on it in some way [registration in a knowledge-base for usage, printing, etc]). As such, they'd become the only datatypes in drasil-database
and SystemInformation
would be removed or moved elsewhere, or they would be moved, together, into a new drasil
"core package/drasil-core
.
For example, we have all sorts of type classes for our "chunks" having all sorts of data. And then higher-level routines can be polymorphic by, instead of asking for specific representations, can ask for any representation that has the data needed. Our 'artifact generators', at the highest level, should work the same way: take a data-representation that has all the needed information, pulls it, and does its job.
If I'm understanding you correctly, I believe that this is the design I'm also thinking of with my above discussion of "reversing dependencies" and "forcing printing qualities to be properties of the higher-level encodings".
Our current design is instead to create one monster representation, which we could call
KitchenSink
instead ofSystemInformation
, which has everything any part could possibly want. So our current process is thus
- collect all the information
- stick it all in one place
- pass it to everyone That worked for a while, but is now fraying. All 3 pieces suffer, in different ways, from this monolithic design. In particular, it is hard to have automation that derives new information from old.
Would we be able to remove SystemInformation
completely in favour of using cast
more often to assisting in grabbing chunks from ChunkDB
s instead? This would allow us to place completely "foreign" types into a ChunkDB
, and since we're using TypeReps to grab data en masse (e.g., for a specific type) and UIDs+TypeReps to grab singular data instances. This would be helpful for new user libraries that build on Drasil but are not upstreamed.
Though, I'm uncertain of 2 problems:
- Is
Data.Typeable (.., cast)
usage problematic in any way? It looks like a safe version ofunsafeCoerce
but it might still be anti-pattern. - Why does a
Systeminformation
contain 2 kinds ofChunkDB
s; "sysinfodb", and "usedinfodb"? It seems like a printer of a chunkdb should know which chunks are "sysinfo"/"used"/etc for itself by treating those chunk types as "relevant or not" to their printing goal.
Instead, the first step we need is an understanding of the requirements of each artifact that we're going to generate. [Not globally for all, locally for each we treat.] Then an encoding of the information needed to actually generate those artifacts. Note that this links in with the above question of 'style choices' that the printers want. That could then lead to a proper decoupling of various pieces, so that we could properly re-use Drasil infrastructure for
drasil-website
without going through the currents hacks. But don't have the exposed API for all the parts that would allow us to do that. In part because too many parts use theKitchenSink
approach.
Perfect, this will be great for drasil-lang
/ #2885. I think this also is relevant to my discussion in my last comment (above this one, regarding CanGen
).
There's a whole lot more content in this issue, but I think the spun-off issues, and the comments above, already are a lot of work. So when that flurry dies down, a revisit (or two or three) will be needed.
Sounds good.
These discussions are getting too big - it would make sense to start new issues when the commentary is going to be more than just a few lines.
Re: printing and classes like CanGenMathExprs
- maybe. I've got some emails about that from multiple years ago. It's actually kind of a tricky design point where Haskell classes don't always quite fit. So it needs a proper design, which means that it first needs a proper analysis. Certainly it's not worth creating classes (in general) if there is only a single instance of it. I'll also email you some design discussion on that topic from a while back.
Re: drasil-database
, etc.
We have a currently adequate explanation of ChunkDB
: it's a container of things with UIDs. Note that that's probably the full explanation as well, which is perhaps unsatisfactory!
I think SystemInformation
was supposed to be just as its name implies: the necessary information for a "system". It was supposed to be arranged so that various different kinds of information would be assembled in it, and we would know a priori what that information was (thus the many fields). But it never quite got there. The rationale for each fields has been lost, and that rationale cannot be adequately reconstructed. So it is probably best to get rid of it, until such a time as we can get a decent explanation for what it ought to be, and even if it ought to exist.
Would we be able to remove SystemInformation completely in favour of using cast more often to assisting in grabbing chunks from ChunkDBs instead?
Here I'm less happy with the question: it mixes design and implementation/solution too much. The actual content of the question may be fine though, in that maybe cast
could end up being part of the implementation of a good design.
To a certain extent, cast
is an anti-pattern. In the sense that Haskell is great because of its static typing, and cast
works firmly against that grain. unsafeCoerce
is the same and different: if used wantonly, it's an anti-pattern. If used as part of a dedicated optimization pass in some efficiency-critical low level layer, that's quite different. Same with cast
: it might be useful because the situation is fundamentally dynamic, or Haskell's type system is not up to expressing the types in an ergonomic way.
If I recall, sysinfodb
was meant to be where we assembled all the information that might be used in a system, and usedinfodb
was the stuff that was actually used. So things reachable from usedinfodb
should appear in the glossary, table of symbols, etc; but its mere presence in sysinfodb
wouldn't trigger anything like that.
Re: moving to new tickets: Will do. I will refrain for this response because I think it will be relatively short. The CanGenMathExprs
will almost definitely need to go into a new ticket, however.
Re: SystemInformation
: Sounds good.
Re: Removing SystemInformation
: I understand. I very briefly mentioned replacing the lists in SystemInformation
in #2873, but I didn't expect that this would become the conclusion (assuming it is).
Re: cast
: That makes sense, thank you. The dynamic nature of what I was planning, was to have them be as open as possible, which might even be too open. It's interesting. I guess this can be something we revisit soon if it becomes problematic, but, immediately, it looks like a robust solution.