grimoirelab-sortinghat icon indicating copy to clipboard operation
grimoirelab-sortinghat copied to clipboard

Sorting hat package creates identity with missing information when merged and unmerged

Open code-sleuth opened this issue 5 years ago • 6 comments

When we merge one profile with another, we call sortinghat.api.merge_unique_identities (https://github.com/LF-Engineering/dev-analytics-sortinghat-api/blob/master/app/apis/profiles/apis.py#L190), with a from and to uuid. The from identity is then deleted and added into the to identity.

Afterwards, if we unmerge that identity from the to identity, for which we use sortinghat.api.move_identity (https://github.com/LF-Engineering/dev-analytics-sortinghat-api/blob/master/app/apis/profiles/apis.py#L269), the from identity that was earlier deleted is recreated but it will be missing name, email and other personal details information it previously had.

code-sleuth avatar Feb 13 '20 17:02 code-sleuth

Current version of SortingHat doesn't not track historic information to that level of detail, so it's not possible to recreate the previous identity. With the new experimental version (see muggle branch) that would be possible because there's a table which stores all the changes in the identities but currently, there's no code to do that. Not sure if it's something we should support.

What's your use case? Why do you need this feature?

sduenas avatar Feb 13 '20 17:02 sduenas

What's your use case?

If there's two identities with let's say two emails that are similar, i think in that case it warrants merging and in case a user thinks the merge was a mistake then they can unmerge.

Why do you need this feature?

Basically giving users the ability to unmerge an identity if they mistakenly merge two identities.

code-sleuth avatar Feb 14 '20 19:02 code-sleuth

I see that the differences between mungle and master are huge. @sduenas Is it safe to use sortingaht based on mungle branch? Can you please point me to DB structure differences that are required to handle this? I've generated a diff file but it is so huge that it is hard to track the actual changes needed in DB structure. Any chances that you create another branch with unmerging support but rebased to current sortinghat master branch?

Here is the diff file mungle-master.diff.txt cc @code-sleuth

lukaszgryglicki avatar Feb 20 '20 09:02 lukaszgryglicki

@lukaszgryglicki, muggle branch is a totally different thing. It's still experimental and not integrated with any other component in the stack. You should not use that branch unless you want to contribute developing it. You have more info about it here: https://github.com/chaoss/grimoirelab-sortinghat/wiki/Roadmap-to-Sorting-Hat-1.0

I can try to rebase the branch to master but as they are incompatible I don't see the point of doing it right now.

sduenas avatar Feb 20 '20 10:02 sduenas

OK, thanks for the info. How about changes to DB structure needed to handle merge/unmerge operations?

lukaszgryglicki avatar Feb 20 '20 11:02 lukaszgryglicki

In muggle we use Django ORM and not SQLAlchemY - as in master - to deal with the database, so it's not only about DB structure. If you want to check it is in here

In any case, muggle doesn't implement what I think you want, which is an "undo" of certain operations. It's something that it's possible to implement with the current schema because we store all the operations done with SortingHat. So, using event sourcing pattern would be possible to recreate some states or to roll back. This is not implemented yet and I'm not sure if we should follow that direction.

I'm also open to ideas about how to manage these cases plus having PRs solving this issue.

sduenas avatar Feb 20 '20 12:02 sduenas