core-geonetwork icon indicating copy to clipboard operation
core-geonetwork copied to clipboard

Passing key into update/remove process xslt for iso 19139 to fix issue with updating/deleting resources with same url

Open wangf1122 opened this issue 1 year ago • 18 comments

The issue is in ISO19139. If the user upload just a single resource file, but decided to link it twice or multiple times for different purpose (i.e. English vs French).

image

But if the user attempt to delete either one of them image

They both will be wiped out

image

The issue is onlinesrc-remove xsl only remove the resource by name and url. But name and url are duplicated and the transformation XSLT decides to remove them all.

This pull request will pass extra indexing measure from the UI component to its XSLT. In this case, it is used for ISO19139. But the extra parameter wont cause any harm to any other schemas.

wangf1122 avatar Oct 12 '23 18:10 wangf1122

Related to #7246 and #5173

Note that thumbnail-remove.xsl has the same issue.

ianwallen avatar Oct 12 '23 19:10 ianwallen

Does it make sense to add 2 times the same URL with same name ? The key for update/remove was based on URL+name and in your case the name for French and English will be different so it should work fine ?

But checking the template maybe we should remove the template for WWW:DOWNLOAD-1.0-http--download matching geonet:*|gmd:onLine[normalize-space(gmd:CI_OnlineResource/gmd:linkage/gmd:URL) = $url and normalize-space(gmd:CI_OnlineResource/gmd:protocol/*) = 'WWW:DOWNLOAD-1.0-http--download'] ?

Using the position, not sure it will work if we have multiple transfer options.

fxprunayre avatar Oct 13 '23 08:10 fxprunayre

Can't we just remove the online resource by simply giving its position in the whole metadata, instead of matching by url/name?

jahow avatar Oct 13 '23 08:10 jahow

@jahow that's the idea of the pull request, but as indicated by Francois at least in ISO19139 it will not work with multiple transfer options blocks with the current changes.

It will require a more complex xslt process to match the distribution section and loop over the different transfer options / online resources to copy all except the one identified by the global position.

Also requires to update the other metadata schemas like iso19115-3.2008.

josegar74 avatar Oct 13 '23 09:10 josegar74

Another option would be to include the protocol in the match (instead of a position), to make it even more unique. A concrete example could be WMS and WFS services accessed through the same url, and using the same layername/featuretype. The only way to differentiate them would be the protocol.

jahow avatar Oct 13 '23 09:10 jahow

@jahow

The protocol was part of the url. If url is not enough to determine the uniqueness, I can't see any point to use protocol which will be same as we use url.

wangf1122 avatar Oct 13 '23 19:10 wangf1122

There is an issue with replacing the $url with a randome uuid $uid. It will end up making the logic no longer compatible with all existing schemas. And changing all existing schemas to use the new id would be a large task. I recommend adding new variables. I also personally don't like using a random uuid for an id as it does not have any linkage to the original metadata.

I have created a PR against this branch https://github.com/wangf1122/core-geonetwork/pull/15 which is based on this branch https://github.com/ianwallen/core-geonetwork/tree/main.prevent.delete.multiple.online.resources.iso.19139.schema_

The change reverts the logic for modifying the id field so that the code can be backwards compatible with existing schemas. I added 2 new field

  • HASH - which is better for comparing the field instead of using url or name as it can compare the full xml section.
  • IDX - index identifying the element position in the xml

With the idx it should be possible to go directly to the element however to ensure that we have the correct element, we can also compare the hash value.

Note that my PR also addressed thumbnails and it also addressed the updating the thumbnail and online data as it had the same issue as the delete. It was updating all records with the same url.

Please review and test the changes.

ianwallen avatar Oct 27 '23 19:10 ianwallen

Looks like the code in my PR https://github.com/wangf1122/core-geonetwork/pull/15 is failing with the following

Error at xsl:when on line 161 column 97 of onlinesrc-add.xsl:
  XPST0017: XPath syntax error at char 452 on line 161 in {...:md5Hex(exslt:node-set(.)) ...}:
    Cannot find a matching 1-argument function named {java:org.fao.geonet.util.XslUtil}md5Hex()

Which seems to be caused by the following namespace not loading the java correclty since it works correctly when executing the application locally.

xmlns:util="java:org.fao.geonet.util.XslUtil"

I'm not sure how to modify the unit test so that it loads the class path correctly to locate the XslUtil class. Any suggestions?

ianwallen avatar Oct 29 '23 10:10 ianwallen

That may be a sign it needs to be an integration test later in the build? How is the XslUtil setup when the application runs?

jodygarnett avatar Oct 29 '23 13:10 jodygarnett

Looks like I have to pass a test dependency to the iso19139 schema to include gn_core. I'm surprised that was not needed before?

However it does not totally work

If I add the following to the iso19139 schema pom file then I can run the test.

    <dependency>
      <groupId>org.geonetwork-opensource</groupId>
      <artifactId>gn-core</artifactId>
      <version>${project.version}</version>
      <scope>test</scope>
    </dependency>
    <dependency>

However if fails to do a maven clean or build from the main pom file with the following error.

[INFO] Scanning for projects...
[ERROR] [ERROR] The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT'}' and 'Vertex{label='org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT'}' introduces to cycle in the graph org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT --> org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT --> org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT @ 
[ERROR] The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT'}' and 'Vertex{label='org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT'}' introduces to cycle in the graph org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT --> org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT --> org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT -> [Help 1]

ianwallen avatar Oct 30 '23 10:10 ianwallen

@ianwallen the test are failing due to this error:

Error at xsl:when on line 164 column 99 of onlinesrc-add.xsl:
  XPST0017: XPath syntax error at char 462 on line 164 in {...:md5Hex(exslt:node-set(.)) ...}:
    Cannot find a matching 1-argument function named {java:org.fao.geonet.util.XslUtil}md5Hex()

I suspect the parameter value exslt:node-set(.) doesn't match the type of the method:

https://github.com/geonetwork/core-geonetwork/blob/66a8ca819192eed8754ce77697f8847e7a9d9d86/core/src/main/java/org/fao/geonet/util/XslUtil.java#L1308-L1310

josegar74 avatar Nov 08 '23 16:11 josegar74

@ianwallen found the problem, the iso19139 doesn't depend on gn-core so doesn't have access to XslUtil, but adding the dependency adds a cyclic dependency, as gn-core depends on gn-schema-iso19139.

We need to check about the reason for this dependency.

josegar74 avatar Nov 08 '23 17:11 josegar74

@wangf1122

The blocker at the moment is attempting to call the md5Hex function as we cannot add the dependency

public static String md5Hex(String str) {
    return org.apache.commons.codec.digest.DigestUtils.md5Hex(str);
}

I wonder if you could try changing

xmlns:util="java:org.fao.geonet.util.XslUtil"

to

xmlns:util="java:org.apache.commons.codec.digest.DigestUtils"

This way it would use the apache library directly and we would not need the dependency. Or if we do need a dependency then it should not be an issue to add the org.apache.commons.codec.digest.DigestUtils dependency in iso 19139

I think if this works then it could unblock this issue.

ianwallen avatar Nov 28 '23 20:11 ianwallen

@ianwallen @josegar74

ok, build is fixed

wangf1122 avatar Nov 28 '23 22:11 wangf1122

gmd:onLine gmd:CI_OnlineResource gmd:linkage gmd:URLhttp://mylink2.com</gmd:URL> </gmd:linkage> gmd:protocol gco:CharacterStringWWW:LINK-1.0-http--link</gco:CharacterString> </gmd:protocol> gmd:name gco:CharacterStringMy link 2</gco:CharacterString> </gmd:name> </gmd:CI_OnlineResource> </gmd:onLine>

I did some test by mocking the data itself.

image

deleted that google_12345_2 description resource, and it get delated and others remain

image

wangf1122 avatar Jan 23 '24 21:01 wangf1122

I think is probably related to the changes for the distributions added in main in #7468, it is not working fine with this branch.

Apologies for all the inconveniences. As #7468 is not going to be back ported, I would suggest create a PR with the current code for 4.2.x and update this one for main

josegar74 avatar Feb 09 '24 14:02 josegar74

I think is probably related to the changes for the distributions added in main in #7468, it is not working fine with this branch.

Apologies for all the inconveniences. As #7468 is not going to be back ported, I would suggest create a PR with the current code for 4.2.x and update this one for main

@josegar74

I created a PR https://github.com/geonetwork/core-geonetwork/pull/7740 for 4.2 branch and needs backporting to 3.12 as well. Will continue to investigate with this PR.

wangf1122 avatar Feb 09 '24 20:02 wangf1122

@josegar74 @ianwallen

I had fixed the issue which we also need to add hashing indexing mechanism to the search engine index.xsl because the new UI module is not pulling relation from the original extract-relations.xsl anymore. It's now pulling from the elastic search indexing directly.

I also need to remove/give up the thumbnail part because the thumbnail is deleting directly the backend store. It's not possible for us to remove single link at one time.

wangf1122 avatar Feb 19 '24 20:02 wangf1122

@wangf1122 the changes for the schema should be done also for iso19115-3.2018.

What would happen if a schema is not updated with these changes?


I tested with iso19115-3.2018 and seem working also, not using the position / hash, but should be fine. Also saw that the xslt process in iso19139 is backwards compatible, so it should be fine. No need to update iso19115-3.2018.

josegar74 avatar Mar 05 '24 09:03 josegar74

@josegar74

@wangf1122 the changes for the schema should be done also for iso19115-3.2018.

What would happen if a schema is not updated with these changes?

I tested with iso19115-3.2018 and seem working also, not using the position / hash, but should be fine. Also saw that the xslt process in iso19139 is backwards compatible, so it should be fine. No need to update iso19115-3.2018.

The logic should be backward compatible and should not affect any schema's that have not been updated. I suggest that we apply the changes to iso19115-3.2018 as a separate PR. As We also need to apply a similar change to HNAP as well.

ianwallen avatar Mar 05 '24 13:03 ianwallen