Add pseudonymization of groups
Closes #14117
Added support for pseudonymized groups by recursively anonymizing all group names in the group tree. The implementation assigns groups-<id> identifiers and integrates the original names into the value-mapping output. This closes the gap where only entries where pseudonymized but groups where not.
Currently, only the group names themselves are pseudonymized (groups-1, groups-2, etc.), but the filters inside the groups (e.g., readstatus = read) are not updated to match the pseudonymized field values (readstatus-1, readstatus-2).
- Should we also pseudonymize the filter values so that the GUI group counts remain correct after pseudonymization?
- Or is it acceptable for the pseudonymized
.bibto break GUI filters and counts, given that the main goal is anonymization?
Feedback is welcome.
Steps to test
- Use JabKit CLI to pseudonymize library (like here #13158)
- Open the generated pseudonymized
.bibfile and check that all groups now have names likegroups-1,groups-2, etc. - Verify that the CSV mapping contains the original group names corresponding to the new pseudonymized identifiers.
https://github.com/user-attachments/assets/76c07c1b-129d-4041-985a-819f000318db
Mandatory checks
- [x] I own the copyright of the code submitted and I license it under the MIT license
- [x] I manually tested my changes in running JabRef (always required)
- [x] I added JUnit tests for changes (if applicable)
- [x] I added screenshots in the PR description (if change is visible to the user)
- [x] I described the change in
CHANGELOG.mdin a way that is understandable for the average user (if change is visible to the user) - [/] I checked the user documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request updating file(s) in https://github.com/JabRef/user-documentation/tree/main/en.
Hey @janbnz!
Thank you for contributing to JabRef! Your help is truly appreciated :heart:
We have automated checks in place, based on which you will soon get feedback if any of them are failing. In a while, maintainers will also review your contribution. Once that happens, you can go through their comments in the "Files changed" tab and act on them, or reply to the conversation if you have further inputs.
Please re-check our contribution guide in case of any other doubts related to our contribution workflow.
Currently, only the group names themselves are pseudonymized (
groups-1,groups-2, etc.), but the filters inside the groups (e.g.,readstatus = read) are not updated to match the pseudonymized field values (readstatus-1,readstatus-2).
You are on the right track. But the details seem to be wrong. readstatus is a special field and not a group.
Entries take their groups in the groups field. In case the group before was abc and now group-1, it should be changed in the entry, too. Please also add a test.
I am talking about "Explicit selection" here. There are other types of groups, but I think, they do not need to be tackled here as the pseudonymization is affecting other areas (e.g., "renaming" of keywords of an entry)
Your pull request conflicts with the target branch.
Please merge with your code. For a step-by-step guide to resolve merge conflicts, see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line.
Entries take their groups in the
groupsfield. In case the group before wasabcand nowgroup-1, it should be changed in the entry, too. Please also add a test.
Thanks for the feedback. I’m a bit unsure what I do have to change now. In my current code, that should already be the case, or am I missing something important here?