core-geonetwork icon indicating copy to clipboard operation
core-geonetwork copied to clipboard

GeoNetwork harvester / Check if a resource exists to save it, instead of trying to retrieve the file details, to avoid confusing NoSuchFileException exception

Open josegar74 opened this issue 1 year ago • 2 comments

The GeoNetwork harvester, when using MEF format and the remote metadata has files attached, logs this type of exceptions:

2023-12-21 11:00:09,194 ERROR [geonetwork.resources] - Error getting size of file <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png: <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png
java.nio.file.NoSuchFileException: <path>/geonetwork_data/data/metadata_data/118200-118299/118243/public/grafik.png
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.fao.geonet.api.records.attachments.FilesystemStore.getResourceDescription(FilesystemStore.java:145)
at org.fao.geonet.api.records.attachments.FilesystemStore.getResourceDescription(FilesystemStore.java:127)
at org.fao.geonet.api.records.attachments.ResourceLoggerStore.getResourceDescription(ResourceLoggerStore.java:151)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.saveFile(Aligner.java:914)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.handleFile(Aligner.java:878)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.access$700(Aligner.java:89)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner$2.handlePublicFile(Aligner.java:735)
at org.fao.geonet.kernel.mef.MEF2Visitor.handleBin(MEF2Visitor.java:147)
at org.fao.geonet.kernel.mef.MEF2Visitor.handleXml(MEF2Visitor.java:108)
at org.fao.geonet.kernel.mef.MEF2Visitor.visit(MEF2Visitor.java:52)
at org.fao.geonet.kernel.mef.MEFLib.visit(MEFLib.java:157)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.updateMetadata(Aligner.java:710)
at org.fao.geonet.kernel.harvest.harvester.geonet.Aligner.align(Aligner.java:263)
at org.fao.geonet.kernel.harvest.harvester.geonet.Harvester.harvest(Harvester.java:231)
...

It's not really an error in the harvesting process, but the log is confusing.

  1. When running for first time the harvester, when identifies a file to add to the data directory of a metadata, checks if the file exists retrieving some information of the file and that causes the exception log as initially the file doesn't exist. The exception is not failing the harvester, it's processed correctly to add the file, but the log is confusing.

  2. When running for second time the harvester, when identifies a file to add to the data directory of a metadata, does the same check (that doesn't fail) and if exists removes the file to add the updated one, but this 2on time works as 1) and logs also the exception, but all is processed properly.

These exceptions are caused by the usage of this method that logs an exception when the resource doesn't exist:

https://github.com/geonetwork/core-geonetwork/blob/1f16287e654bc24b6b19b6f3bcd98ef593229676/core/src/main/java/org/fao/geonet/api/records/attachments/FilesystemStore.java#L160-L178

Changed the code to use the following method that throws an Exception, but doesn't log it. The harvester code captures the exception to decide if the file should be saved or not.

https://github.com/geonetwork/core-geonetwork/blob/1f16287e654bc24b6b19b6f3bcd98ef593229676/core/src/main/java/org/fao/geonet/api/records/attachments/FilesystemStore.java#L106-L122


The change request contains also Sonarlint improvements.


Test case:

  1. Create a GeoNetwork harvester to a server that has the iso19139 samples

  2. Execute the harvester.

  • Without the change: The exceptions described previously are logged.
  • With the change: There are not confusing exceptions in the log.

Checklist

  • [X] I have read the contribution guidelines
  • [X] Pull request provided for main branch, backports managed with label
  • [X] Good housekeeping of code, cleaning up comments, tests, and documentation
  • [X] Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • [X] Clean commit messages, longer verbose messages are encouraged
  • [ ] API Changes are identified in commit messages
  • [ ] Testing provided for features or enhancements using automatic tests)
  • [ ] User documentation provided for new features or enhancements in manual
  • [ ] Build documentation provided for development instructions in README.md files
  • [ ] Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

josegar74 avatar Dec 22 '23 11:12 josegar74

Does this work too with other storage providers? Or only the local one?

juanluisrp avatar Jan 11 '24 17:01 juanluisrp

@juanluisrp I don't know with other providers, but the change just tries to retrieve the file instead of the file information to check if exists and avoid a confusing exception.

Potentially should be the same for other providers, if the file doesn't exist should throw a ResourceNotFoundException and the code changed should work as expected with other providers.

josegar74 avatar Feb 20 '24 09:02 josegar74

@ianwallen related to the question from @juanluisrp, do you think this change can cause issues with CMIS? Otherwise I would like to merge it.

josegar74 avatar Mar 01 '24 13:03 josegar74

@josegar74

@ianwallen related to the question from @juanluisrp, do you think this change can cause issues with CMIS? Otherwise I would like to merge it.

I agree that it should not affect other providers

ianwallen avatar Mar 06 '24 14:03 ianwallen

The backport to 3.12.x failed:

The process '/usr/bin/git' failed with exit code 1
stderr
error: could not apply 9ef7c228e0... GeoNetwork harvester / Check if a resource exists to save it, instead of trying to retrieve the file details, to avoid confusing NoSuchFileException exception
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

stdout
Auto-merging harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/Aligner.java
CONFLICT (content): Merge conflict in harvesters/src/main/java/org/fao/geonet/kernel/harvest/harvester/geonet/Aligner.java

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-3.12.x 3.12.x
# Navigate to the new working tree
cd .worktrees/backport-3.12.x
# Create a new branch
git switch --create backport-7577-to-3.12.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick 9ef7c228e071c1853f9ce063949efae7420fa1f1,42fc80b2f4bc97ec1b440377a74a7a6942b9ec39
# Push it to GitHub
git push --set-upstream origin backport-7577-to-3.12.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-3.12.x

Then, create a pull request where the base branch is 3.12.x and the compare/head branch is backport-7577-to-3.12.x.

geonetworkbuild avatar Mar 07 '24 07:03 geonetworkbuild