jabref icon indicating copy to clipboard operation
jabref copied to clipboard

Improve arXiv Parsing (#11306)

Open narasimhareddyputta94 opened this issue 1 year ago • 2 comments

Fix: Improve arXiv Parsing (#11306) Summary: This pull request addresses the improvement of parsing arXiv entries in JabRef. The focus is on ensuring that arXiv-related fields are correctly identified and moved to the appropriate eprint fields, following consistent formatting guidelines.

Changes Made:

ArxivCleanup Class:

Added a new class ArxivCleanup to handle the cleanup logic for arXiv entries. The class ensures that fields such as note, institution, version, and eid are parsed and moved to the appropriate eprint fields. Implemented logic to trim any extra spaces to maintain consistent formatting. Integration into CleanupWorker:

Modified the CleanupWorker class to include the ArxivCleanup job. Updated the toJob method to recognize the ARXIV_CLEANUP enum and instantiate the ArxivCleanup class. Unit Tests:

Added ArxivCleanupTest class to test the functionality of ArxivCleanup. Verified that the cleanup logic correctly moves arXiv-related information to the eprint fields. Ensured the tests handle cases with and without extra spaces in the field values. Testing:

Added and executed unit tests in the ArxivCleanupTest class. Ran tests to ensure correct functionality and formatting. Successfully ran the tests using ./gradlew cleanTest test to validate changes in a clean environment. Steps to Reproduce the Issue:

Create an entry with fields: note: arXiv: 1503.05173 version: 1 institution: arxiv eid: arXiv:1503.05173 Run cleanup entries with "Move preprint information from 'URL' and 'journal' field to the 'eprint' field". Verify that the entry is updated correctly, moving information to the appropriate eprint fields. Expected Behavior:

Information in the note, institution, version, and eid fields should be moved to the correct eprint* fields without extra spaces. Actual Behavior:

The information was not correctly moved or formatted before this fix. This pull request ensures that arXiv entries are correctly parsed and cleaned up, improving the overall data consistency in JabRef.

This fixes issue #11306.

Mandatory checks

  • [ ] Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • [ ] Tests created for changes (if applicable)
  • [ ] Manually tested changed features in running JabRef (always required)
  • [ ] Screenshots added in PR description (for UI changes)
  • [ ] Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • [ ] Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

narasimhareddyputta94 avatar Jun 19 '24 12:06 narasimhareddyputta94

Hi @narasimhareddyputta94 , thanks for your PR. Your changes look interesting. We will try to look into them the next days, currently we are a bit busy with our day jobs. Please look at the failing tests, it seems your branch is not compiling currently. Therefor I mark it as a draft again. Please also don't remove the section about the Mandatory checks for a PR, we have them there for the reason, that every contributor should remember that these checks have to be taken care of and kept in mind before creating a PR.

calixtus avatar Jun 20 '24 15:06 calixtus

@narasimhareddyputta94 Please work on the submodule configuration.

I tried to identify the cause, but I did not find the commit output by github:

grafik

Please do the following:

  • Ensure everything is committed
  • gitk --all& (to start gitk to have the commit IDs ready if something goes wrong)
  • git merge upstream/main (to merge the latest upstream changes)
  • git reset upstream/main (to discard all commits and start a fresh commit)
  • git gui (to craft a new commit. Do NOT commit the changes in the submodule. Try to revert these changes using "Commit" -> "Revert Changes")
  • Create a new commit using git gui (maybe other git tooling you use)
  • git push -f (to overwrite the changes also in this PR)

Now, this PR should show one commit without any submodule change

koppor avatar Jun 26 '24 13:06 koppor

Too much noise in the code. Better start from scratch.

koppor avatar Aug 07 '24 21:08 koppor