Drasil icon indicating copy to clipboard operation
Drasil copied to clipboard

Duplicate UIDs in Projectile

Open hrzhuang opened this issue 1 year ago • 1 comments

As part of #2873, we need to eliminate multiple chunks having the same UID. I have taken a look at the UID conflicts in Projectile, and the following are all of the unique kinds of conflicts.

{"terms", "references", "labelledContent"}
{"symbols", "dataDefinitions", "terms", "references", "labelledContent"}
{"conceptInstances", "references", "concepts"}
{"sections", "references"}
{"references", "labelledContent", "theoryModels"}
{"symbols", "generalDefinitions", "terms", "references", "labelledContent"}
{"terms", "units"}
{"references", "labelledContent"}
{"symbols", "instanceModels", "terms", "references", "labelledContent"}
{"terms", "symbols"}

Note

"AssumpsLabel": ["sections", "references"]

in the UID conflicts file means the UID AssumpsLabel is both a section and a reference in the ChunkDB.

This translates to

{"sections", "references"}

in my list above.

I think we can group the maps involved into 4 categories:

  1. Concepts (and parts thereof): terms, symbols, conceptInstances, concepts, symbols, units
  2. References: references
  3. Models: dataDefinitions, theoryModels, generalDefinitions, instanceModels
  4. Document fragments: labelledContent, sections

Here are my thoughts (in the context of the new ChunkDB described in #2873):

  • Parts of concepts should not be in the ChunkDB, since we can just look up the whole concept and take the part we want.
  • Internal references should not be in the ChunkDB, since we can just look up the thing we are referring to and construct a reference to it
  • Document fragments should have different UIDs from the underlying knowledge

I'm not so sure about the models. generalDefinitions and dataDefinitions can be considered definitions of concepts and therefore part of the concepts they define. On the other hand, they are used similarly to other kinds of models and it is not true in general that a model defines a concept.

hrzhuang avatar Jul 05 '23 19:07 hrzhuang

Sorry to take so long to comment on this.

  1. I agree, parts of concepts should not be in the ChunkDB. However, we will find bugs where there really are bad duplicates (i.e. we mean 2 different things that are related)
  2. I'm less sure about this. In some sense, there could be a 'local' part of the database that can be used for lookups of things we're currently building. In part, this could increase modularity by not hard-coding some references. But I'd like to see hard details.
  3. Completely agree that document fragments should have different UIDs from the underlying knowledge. That's a plain bug.

I do believe your recent changes in #3531 probably help with point 3 already?

Saying much more would be made easier if I could see some example conflicts (concrete, no via 2 levels of indirection!).

JacquesCarette avatar Jul 20 '23 21:07 JacquesCarette