jabref
jabref copied to clipboard
Refactored data clumps with the help of LLMs (research project)
Hello maintainers,
I am conducting a master thesis project focused on enhancing code quality through automated refactoring of data clumps, assisted by Large Language Models (LLMs).
Data clump definition
A data clump exists if
- two methods (in the same or in different classes) have at least 3 common parameters and one of those methods does not override the other, or
- At least three fields in a class are common with the parameters of a method (in the same or in a different class), or
- Two different classes have at least three common fields
See also the following UML diagram as an example
I believe these refactoring can contribute to the project by reducing complexity and enhancing readability of your source code.
Pursuant to the EU AI Act, I fully disclose the use of LLMs in generating these refactorings, emphasizing that all changes have undergone human review for quality assurance.
Even if you decide not to integrate my changes to your codebase (which is perfectly fine), I ask you to fill out a feedback survey, which will be scientifically evaluated to determine the acceptance of AI-supported refactorings. You can find the feedback survey under https://campus.lamapoll.de/Data-clump-refactoring/en
Thank you for considering my contribution. I look forward to your feedback. If you have any other questions or comments, feel free to write a comment, or email me under [email protected] .
Best regards, Timo Schoemaker Department of Computer Science University of Osnabrück
-->
Mandatory checks
- [ /] Change in
CHANGELOG.md
described in a way that is understandable for the average user (if applicable) - [ /] Tests created for changes (if applicable)
- [ /] Manually tested changed features in running JabRef (always required)
- [ ] Screenshots added in PR description (for UI changes)
- [ ] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
- [ ] Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
@compf please cite JabRef in your thesis. Use the citation as provided at https://docs.jabref.org/faq
@compf We have guideline to setup IntelliJ so that checkstyle won't complain at https://devdocs.jabref.org/getting-into-the-code/guidelines-for-setting-up-a-local-workspace/intellij-13-code-style.html. - Sorry, that this is such an effort. We did not dare to check in IntelliJ configs, because we fear that for each update of IntelliJ, the configs will change. We do not want to force our dozens of student contributors to fiddle around with their IntelliJ.
Thank you very much for the feedback. I will certainly cite Jabref.
Somehow my Checkstyle was buggy so it didn't spot these formatting issues. After reinstalling, it worked
Some more comments :)
I think, I should re-do the survey. The LLM is very bad in keeping
@param
comments. I, formyself, understood it now:
- There are
@param
comments for interpretation of the given object at each method- They explain how the method behaves
- Thus, the comment should be moved into a newly created
@param
annotation for the called method.- If the
@param
describes the parameter (and not the treatment of the parameter inside the method), it should be moved to the newly created class(I hope, this was somehow clear)
I think your first survey was fine. I have to admit that not everything was done by an LLM, in your case many things were performed by a tool I wrote, and which has still some strange bugs I need to fix, so I had to do a little manual refactoring (with associated mistakes), so you should not blame the LLM. :) I can try to fix the other issues if I find time, but keep in mind that I do not have an insight in your project like you do, and the feedback to the first commit is what's important for my study. Nevertheless, your projects is very interesting and I use Jabref often for managing scientific sources, so maybe I will contribute in other ways :)
@compf Happy start into a new week. May I ask if you'll find time to finish this PR? 😅
Oh, I totally forgot. I will finish the pull request by the end of this week if that's ok :)