Drasil
Drasil copied to clipboard
Duplicate UIDs in Projectile
As part of #2873, we need to eliminate multiple chunks having the same UID. I have taken a look at the UID conflicts in Projectile, and the following are all of the unique kinds of conflicts.
{"terms", "references", "labelledContent"}
{"symbols", "dataDefinitions", "terms", "references", "labelledContent"}
{"conceptInstances", "references", "concepts"}
{"sections", "references"}
{"references", "labelledContent", "theoryModels"}
{"symbols", "generalDefinitions", "terms", "references", "labelledContent"}
{"terms", "units"}
{"references", "labelledContent"}
{"symbols", "instanceModels", "terms", "references", "labelledContent"}
{"terms", "symbols"}
Note
"AssumpsLabel": ["sections", "references"]
in the UID conflicts file means the UID
AssumpsLabel
is both a section and a reference in the ChunkDB.This translates to
{"sections", "references"}
in my list above.
I think we can group the maps involved into 4 categories:
- Concepts (and parts thereof):
terms
,symbols
,conceptInstances
,concepts
,symbols
,units
- References:
references
- Models:
dataDefinitions
,theoryModels
,generalDefinitions
,instanceModels
- Document fragments:
labelledContent
,sections
Here are my thoughts (in the context of the new ChunkDB described in #2873):
- Parts of concepts should not be in the ChunkDB, since we can just look up the whole concept and take the part we want.
- Internal references should not be in the ChunkDB, since we can just look up the thing we are referring to and construct a reference to it
- Document fragments should have different UIDs from the underlying knowledge
I'm not so sure about the models. generalDefinitions
and dataDefinitions
can be considered definitions of concepts and therefore part of the concepts they define. On the other hand, they are used similarly to other kinds of models and it is not true in general that a model defines a concept.
Sorry to take so long to comment on this.
- I agree, parts of concepts should not be in the ChunkDB. However, we will find bugs where there really are bad duplicates (i.e. we mean 2 different things that are related)
- I'm less sure about this. In some sense, there could be a 'local' part of the database that can be used for lookups of things we're currently building. In part, this could increase modularity by not hard-coding some references. But I'd like to see hard details.
- Completely agree that document fragments should have different UIDs from the underlying knowledge. That's a plain bug.
I do believe your recent changes in #3531 probably help with point 3 already?
Saying much more would be made easier if I could see some example conflicts (concrete, no via 2 levels of indirection!).