core-geonetwork
core-geonetwork copied to clipboard
GeoNetwork harvester / Check if a resource exists to save it, instead of trying to retrieve the file details, to avoid confusing NoSuchFileException exception
The GeoNetwork harvester, when using MEF format and the remote metadata has files attached, logs this type of exceptions:
2023-12-21 11:00:09,194 ERROR [geonetwork.resources] - Error getting size of file <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png: <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png
java.nio.file.NoSuchFileException: <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.fao.geonet.api.records.attachments.FilesystemStore.getResourceDescription(FilesystemStore.java:145)
at org.fao.geonet.api.records.attachments.FilesystemStore.getResourceDescription(FilesystemStore.java:127)
at org.fao.geonet.api.records.attachments.ResourceLoggerStore.getResourceDescription(ResourceLoggerStore.java:151)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.saveFile(Aligner.java:914)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.handleFile(Aligner.java:878)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.access$700(Aligner.java:89)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner$2.handlePublicFile(Aligner.java:735)
at org.fao.geonet.kernel.mef.MEF2Visitor.handleBin(MEF2Visitor.java:147)
at org.fao.geonet.kernel.mef.MEF2Visitor.handleXml(MEF2Visitor.java:108)
at org.fao.geonet.kernel.mef.MEF2Visitor.visit(MEF2Visitor.java:52)
at org.fao.geonet.kernel.mef.MEFLib.visit(MEFLib.java:157)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.updateMetadata(Aligner.java:710)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.align(Aligner.java:263)
at org.fao.geonet.kernel.harvest.harvester.geonet.Harvester.harvest(Harvester.java:231)
...
It's not really an error in the harvesting process, but the log is confusing.
-
When running for first time the harvester, when identifies a file to add to the data directory of a metadata, checks if the file exists retrieving some information of the file and that causes the exception log as initially the file doesn't exist. The exception is not failing the harvester, it's processed correctly to add the file, but the log is confusing.
-
When running for second time the harvester, when identifies a file to add to the data directory of a metadata, does the same check (that doesn't fail) and if exists removes the file to add the updated one, but this 2on time works as 1) and logs also the exception, but all is processed properly.
These exceptions are caused by the usage of this method that logs an exception when the resource doesn't exist:
https://github.com/geonetwork/core-geonetwork/blob/1f16287e654bc24b6b19b6f3bcd98ef593229676/core/src/main/java/org/fao/geonet/api/records/attachments/FilesystemStore.java#L160-L178
Changed the code to use the following method that throws an Exception, but doesn't log it. The harvester code captures the exception to decide if the file should be saved or not.
https://github.com/geonetwork/core-geonetwork/blob/1f16287e654bc24b6b19b6f3bcd98ef593229676/core/src/main/java/org/fao/geonet/api/records/attachments/FilesystemStore.java#L106-L122
The change request contains also Sonarlint improvements.
Test case:
-
Create a GeoNetwork harvester to a server that has the iso19139 samples
-
Execute the harvester.
- Without the change: The exceptions described previously are logged.
- With the change: There are not confusing exceptions in the log.
Checklist
- [X] I have read the contribution guidelines
- [X] Pull request provided for
main
branch, backports managed with label - [X] Good housekeeping of code, cleaning up comments, tests, and documentation
- [X] Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
- [X] Clean commit messages, longer verbose messages are encouraged
- [ ] API Changes are identified in commit messages
- [ ] Testing provided for features or enhancements using automatic tests)
- [ ] User documentation provided for new features or enhancements in manual
- [ ] Build documentation provided for development instructions in
README.md
files - [ ] Library management using
pom.xml
dependency management. Update build documentation with intended library use and library tutorials or documentation
Does this work too with other storage providers? Or only the local one?
@juanluisrp I don't know with other providers, but the change just tries to retrieve the file instead of the file information to check if exists and avoid a confusing exception.
Potentially should be the same for other providers, if the file doesn't exist should throw a ResourceNotFoundException
and the code changed should work as expected with other providers.
@ianwallen related to the question from @juanluisrp, do you think this change can cause issues with CMIS? Otherwise I would like to merge it.
@josegar74
@ianwallen related to the question from @juanluisrp, do you think this change can cause issues with CMIS? Otherwise I would like to merge it.
I agree that it should not affect other providers
The backport to 3.12.x
failed:
The process '/usr/bin/git' failed with exit code 1
stderr
error: could not apply 9ef7c228e0... GeoNetwork harvester / Check if a resource exists to save it, instead of trying to retrieve the file details, to avoid confusing NoSuchFileException exception
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
stdout
Auto-merging harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/Aligner.java
CONFLICT (content): Merge conflict in harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/Aligner.java
To backport manually, run these commands in your terminal:
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-3.12.x 3.12.x
# Navigate to the new working tree
cd .worktrees/backport-3.12.x
# Create a new branch
git switch --create backport-7577-to-3.12.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick 9ef7c228e071c1853f9ce063949efae7420fa1f1,42fc80b2f4bc97ec1b440377a74a7a6942b9ec39
# Push it to GitHub
git push --set-upstream origin backport-7577-to-3.12.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-3.12.x
Then, create a pull request where the base
branch is 3.12.x
and the compare
/head
branch is backport-7577-to-3.12.x
.