pycsw icon indicating copy to clipboard operation
pycsw copied to clipboard

ogc api records 404 if uuid contains '/' or '%2F'

Open pvgenuchten opened this issue 2 years ago • 4 comments

Description

Some communities tend to place a doi in metadata identifier e.g. 10.5281/zenodo.4088113 If you navigate to this item using /collections/metadata:main/items/10.5281/zenodo.4088113, then a 404 is returned, same error occurs with /collections/metadata:main/items/10.5281%2Fzenodo.4088113 (urlencoded).

I kind of expected option 2 to work fine, in which case we would have to make sure we always encode the '/' to %2F, alternatives could be to prevent '/' in identifiers by substituting to '-' or throw an error on insert.

pvgenuchten avatar Mar 06 '23 16:03 pvgenuchten

this problem actually exists on the demo server https://demo.pycsw.org/gisdata/collections/metadata:main/items/http%3A%2F%2Fcapita.wustl.edu%2FDataspaceMetadata_ISO%2FCIRA.VIEWS.BRf.xml

I was curious if the problem exists also on pygeoapi, but there it seems covered, maybe it is considered by the flask api?

pvgenuchten avatar Feb 26 '24 12:02 pvgenuchten

seems the problem still exists:

  • https://demo.pycsw.org/gisdata/collections/metadata:main/items/http://capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml
  • https://demo.pycsw.org/gisdata/collections/metadata:main/items/http://capita.wustl.edu/DataspaceMetadata_ISO/FZ%20Juelich.MACC.vmr_no.xml

pvgenuchten avatar May 13 '24 10:05 pvgenuchten

@pvgenuchten I cannot reproduce this issue locally. I've tried inspecting on demo.pycsw.org directly, and found the following.

Given a URL like:

https://demo.pycsw.org/gisdata/collections//metadata:main/items/http://capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml

And the below pycsw container logs on demo.pycsw.org:

[2024-08-03T19:16:39Z] {/home/pycsw/pycsw/pycsw/ogc/api/records.py:837} DEBUG - Querying repository for item http:/capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml

Here, we see that http://capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml is getting converted to http:/capita.wustl.edu/DataspaceMetadata_ISO/CIRA.VIEWS.MF.xml.

I tried adjusting the nginx setup on demo.pycsw.org withmerge_slashes: off; but no luck.

In this case I would say things are working at the application level as expected.

cc @kalxas @ricardogsilva

tomkralidis avatar Aug 03 '24 19:08 tomkralidis

should we close it, or wait for re-configuration on demo server? @kalxas

pvgenuchten avatar Aug 16 '24 10:08 pvgenuchten

see #998

pvgenuchten avatar Nov 08 '24 14:11 pvgenuchten