Drasil icon indicating copy to clipboard operation
Drasil copied to clipboard

What goes in `drasil-metadata`?

Open balacij opened this issue 3 months ago • 4 comments

Spawning from #4208

Obviously, "metadata," but "meta" with respect to what? My initial guess is that it should contain chunk instances that explain what about Drasil's chunks mean (in English).

balacij avatar Sep 19 '25 21:09 balacij

It is meta with respect to the purpose of Drasil (generating scientific code) because it is data necessary for Drasil's functioning.

JacquesCarette avatar Sep 22 '25 18:09 JacquesCarette

Ok, I think we're getting somewhere with this after our recent in-person meeting, specifically where we discussed that at each step in the IDP/Drasil architecture, there can be different buckets of background ~~theories~~ knowledge. Now, this issue was related to past work on moving things into drasil-metadata:

  • https://github.com/JacquesCarette/Drasil/issues/4219
  • https://github.com/JacquesCarette/Drasil/pull/4208
  • https://github.com/JacquesCarette/Drasil/pull/4250
  • https://github.com/JacquesCarette/Drasil/pull/4252

In these issues/PRs, we spoke about drasil-metadb, and I guess I just didn't understand it in the past, but I believe I see now what you were trying to convey (but I just wasn't understanding): there are at least two ways of tagging metadata:

  • Information (chunks) about our chunk types, translators, etc.
  • Information (again, chunks) the translators need to function (e.g., because they are referenced by an SRS generator or code generator or any other translator/generator).

Note that there is overlap between these categories, but right now, we do not separate them from our 'data'.

Image

So, now I'll make a (hopefully obvious) claim: [meta]data is just data. However, thinking of some data is meta and some as non-meta is helpful, but deciphering what is/is not meta is still difficult. Metadata is perhaps easiest to recognize (/only exists) with respect to a well-defined scope. For example, with respect to any of the red arrows and/or double-bar black arrows. Metadata exists in any of these phases:

  • The data (not meta) is the information being translated/used by translator/generator (the red arrow/double-bar black arrows) -- this data can be completely different between usage.
  • The metadata is the information the translators/generators were built with and always need in scope -- always stays the same pool of data.

For example, if/when IMs/TMs/DDs/GDs become purely 'display knowledge', the 'math knowledge' wouldn't refer to other IMs/TMs/DDs/GDs at all, but other theories/theory components. So, the data being manipulated by the double-bar black arrow between "Problem Description + Solution Desiderata" to "SRS" wouldn't necessarily need IMs/TMs/DDs/GDs in scope at all, but the relevant background metadata used would (i.e., the background ~~theory~~ knowledge with respect to said arrow).

From an operational POV, can we separate these two kinds of data and know (at Drasil's runtime, without any sort of complex metaprogramming) which data belongs to which 'level'? At the moment, no, but if we separate the pools of knowledge into separate ChunkDBs, we would.

From a coding task POV, this might mean the following:

  1. We erase drasil-metadata in favour of keeping an extra exposed module in each of our packages containing a renderer that exposes the metadata (e.g., Drasil.Code.Metadata). This data would not strictly be imported by anything other than an 'analysis' package, e.g., drasil-analysis that we use to analyze what knowledge is actually required at each step of generation/translation or ... anything else (I haven't thought this through fully yet). This module would expose a single ChunkDB that contains all the chunks the 'main translator/generator' that the package contains.
  2. Give each generator/translator their own ChunkDB to work with. These databases would be their own meta-dbs with some new data inputted. The data injected into them should not contain the entire contents from the ChunkDB of the data inputted (which are all 'options') into the generator/translator. Note that the generators would not work with two ChunkDBs simultaneously. They would work with only their own, with the new data inserted.
  3. (Stretch) We design our packages around being either a bubble or an arrow (from the picture) -- i.e., either a family of things or a translator/generator between families of (most likely different) things.

balacij avatar Oct 31 '25 14:10 balacij

I'll re-open this issue. If we agree with the above, we can figure out coding tasks. If we don't, we can just close the issue again.

Aside: if the above is true and we move to separated ChunkDBs, we gain information about how data moves through Drasil's pipeline, and that will be neat. We could annotate the "information flow diagram in Drasil" with some concrete numbers for one of our examples.

balacij avatar Oct 31 '25 15:10 balacij

[meta]data is just data.

Agreed. And meta-programs are (just) programs.

However, thinking of some data is meta and some as non-meta is helpful, but deciphering what is/is not meta is still difficult.

Agreed.

You division between data and meta-data in the above seems 'right' in the context of Drasil.

From an operational POV, [...] we would.

Agreed.

Comments on coding:

  1. Maybe - but I would not rush to do this. We have different kinds of metadata, and they might eventually want to live in different places. I'm all for migrating metadata to a 'better' home (and deleting empty packages).
  2. I don't like to think in terms of ChunkDB per se, I prefer some kind of 'knowledge base', which might contain fairly heterogeneous information, stored in a heterogeneous manner. However, I agree at the higher level.
  3. We should design our packages as 'knowledge units' that have a share responsibility. That this can line up fairly well with bubbles and arrows is true. But our current packages are decent, so we should find out why they are different than our picture(s) and adjust.

The most important coding task, for now, is getting Drasil to generate the variants of Projectile that you have created. The above would be well-suited, and maybe even necessary, for a second paper centered more squarely on Drasil.

JacquesCarette avatar Oct 31 '25 17:10 JacquesCarette