Drasil icon indicating copy to clipboard operation
Drasil copied to clipboard

Duplicate `UID`s with different data

Open balacij opened this issue 7 months ago • 9 comments

The CI for https://github.com/JacquesCarette/Drasil/pull/4089 originally failed. When the ChunkDB has an insertion on a duplicate UID, it normally does an overwrite. Originally, in said PR, I kept the original and ignored the new one (by accident!). The CI failed because build/ did not match stable/! However, even with duplicate UIDs being inserted, the data should have been the same!

balacij avatar May 30 '25 16:05 balacij

@jackwyand Do you think you can analyze the diff and where the differences are coming from? Cherry-picking https://github.com/JacquesCarette/Drasil/commit/718371bfb2f8fc2cf5a55e6a47ec0a60bd144a96 locally might be helpful.

balacij avatar May 30 '25 16:05 balacij

This is likely to be quite fruitful indeed.

JacquesCarette avatar May 30 '25 20:05 JacquesCarette

Diffs

There are 3 examples that had a diff in the commit:

  • GamePhysics
  • DblPend
  • Projectile

GamePhysics

Due to no longer overwriting the duplicate UIDs, the reason for the diff in GamePhysics has to do with the Chunks iVect and dVect being added to the ChunkDB. https://github.com/JacquesCarette/Drasil/blob/74b4803c9e91668eb41c696fae15a8fb4ceb6410/code/drasil-example/gamephysics/lib/Drasil/GamePhysics/Unitals.hs#L61-L72

unitalChunks is added to the SymbolMap and TermMap of the ChunkDB, and it contains both iVect and dVect which share a duplicate UID. Since iVect is added first, it is replacing the usage of dVect which changes symbols and the description of the symbol in the SRS.

In the diff, we can see that $\hat{\mathbf{d}}$ is replaced with $\hat{\mathbf{i}}$, along with the description. Description

diff --strip-trailing-cr -r -X ../.gitignore stable/gamephysics/SRS/HTML/GamePhysics_SRS.html build/gamephysics/SRS/HTML/GamePhysics_SRS.html
210,216d209
<                   <td><em><b>d&#770;</b></em></td>
<                   <td>
<                     Unit vector directed from the center of the large mass to the center of the smaller mass
<                   </td>
<                   <td><em>m</em></td>
<                 </tr>
<                 <tr>
283a277,281
>                   <td><em><b>i&#770;</b></em></td>
>                   <td>Horizontal UnitVector</td>
>                   <td><em>m</em></td>
>                 </tr>
>                 <tr>

Instance of usage (the last symbol was changed to $\hat{\mathbf{i}}$)

<                         \[\symbf{g}=-\frac{G\,M}{\|\symbf{d}\|^{2}}\,\symbf{\hat{d}}\]
---
>                         \[\symbf{g}=-\frac{G\,M}{\|\symbf{d}\|^{2}}\,\symbf{\hat{i}}\]

DblPend

In DblPend, it looks like this issue was already addressed in #4088 but I will quickly mention it here. The reason for the diff is because of startOriginSingle and startOriginDouble sharing a duplicate UID. https://github.com/JacquesCarette/Drasil/blob/718371bfb2f8fc2cf5a55e6a47ec0a60bd144a96/code/drasil-example/dblpend/lib/Drasil/DblPend/Assumptions.hs#L30-L31

Both of these are added to the ConceptInstances in the ChunkDB, and since the original one is now the one used, it is using the incorrect description (startOriginDescSingle) as seen in the diff.

diff --strip-trailing-cr -r -X ../.gitignore stable/dblpend/SRS/HTML/DblPend_SRS.html build/dblpend/SRS/HTML/DblPend_SRS.html
646c646
<                       startOrigin: The first rod is attached to the origin.
---
>                       startOrigin: The pendulum is attached to the origin.

Projectile

Projectile is a little less straightforward for me. I could not find an explicit duplicate UID of the chunk causing the issue, but the changes made in the commit has caused an interesting diff to the "output message." Here is where it is defined in the code: https://github.com/JacquesCarette/Drasil/blob/718371bfb2f8fc2cf5a55e6a47ec0a60bd144a96/code/drasil-example/projectile/lib/Drasil/Projectile/Unitals.hs#L41-L43

Based on the diff, it looks like it is using the UID ("message") rather than the noun phrase that is used in stable/ (S "output message as a string").

diff --strip-trailing-cr -r -X ../.gitignore stable/projectile/SRS/HTML/Projectile_SRS.html build/projectile/SRS/HTML/Projectile_SRS.html
1680c1680
<                           <li><em>s</em> is the output message as a string (Unitless)</li>
---
>                           <li><em>s</em> is the message (Unitless)</li>

I am not sure if it is related to the duplicate UIDs, or something deeper? I will continue looking a bit to see if I can uncover anything else.

jackwyand avatar Jun 03 '25 14:06 jackwyand

Projectile (continued)

Looking further into the code, the cause of the diff for projectile comes from the message IdeaDict that is added in doccon. It is added in the TermMap before the correct one, hence why it is resulting in the diff. https://github.com/JacquesCarette/Drasil/blob/718371bfb2f8fc2cf5a55e6a47ec0a60bd144a96/code/drasil-data/lib/Data/Drasil/Concepts/Documentation.hs#L175

https://github.com/JacquesCarette/Drasil/blob/718371bfb2f8fc2cf5a55e6a47ec0a60bd144a96/code/drasil-example/projectile/lib/Drasil/Projectile/Body.hs#L169-L177

In the second code snippet, doccon can be seen being added before unitalIdeas (where the correct message exists) in the second argument (the TermMap).

jackwyand avatar Jun 03 '25 15:06 jackwyand

Thank you for the analysis @jackwyand. For the game physics example, the new code is wrong. The unit vector in the horizontal direction is not the same thing as the unit vector in the direction connecting the two centres of mass. For the double pendulum, I think we can remove the assumptions about the attachment of the rods.

smiths avatar Jun 03 '25 16:06 smiths

For projectile, changing "is the output message as a string (Unitless)" to "is the message (Unitless)" doesn't seem like that big a deal to me. I'm fine with the new version, although maybe the fact that it changed is an indication of something more concerning going on? :smile:

smiths avatar Jun 03 '25 16:06 smiths

For GamePhysics: we need to find out what iVect and dVect are supposed to mean. They should definitely have different UIDs. Changing that might help reveal what's what.

For Projectile: the QuantityDict message should be renamed messageString (same with its UID) in Projectile's Unitals.hs. Then the example should be changed to no longer produce an output diff.

In all cases, this was a duplicate UID in error, in that the concepts were different. message is the general concept of a message where messageString is a particular representation of that general concept.

JacquesCarette avatar Jun 04 '25 17:06 JacquesCarette

iVect is the unit vector along the $x$ axis in a Cartesian coordinate system. dVect is unit vector in the direction of a vector connecting the centre of mass of two different masses. I agree that they should definitely have different UIDs. They share the definition of unit vectors (length of 1), but they come up in different contexts.

smiths avatar Jun 04 '25 17:06 smiths

Yep, I think all of these stem from duplicate UIDs so fixing those is the priority.

jackwyand avatar Jun 05 '25 12:06 jackwyand

All of these duplicates have been fixed so no more diffs should occur if the ordering of adding them is changed, hence I think this issue can be closed.

jackwyand avatar Jul 02 '25 16:07 jackwyand

Thank you, @jackwyand !

balacij avatar Jul 02 '25 17:07 balacij