OWSLib icon indicating copy to clipboard operation
OWSLib copied to clipboard

CSW client cannot handle outputscheme

Open nicholascar opened this issue 5 years ago • 1 comments

When I call getrecords2 like this:

csw.getrecords2(
    startposition=startposition,
    maxrecords=pagesize,
    outputschema=outputschema,
    esn='full',
    sortby=sortby
)

no value for outputschema, other than http://www.isotc211.org/2005/gmd works, which is no good, since we are using 19115-1 (http://standards.iso.org/iso/19115/-3/mdb/1.0).

POSTing raw request to the CSW server (via Python requests), outputscheme='owl' works to give me the server's own output scheme but the OWSLib's csw client csw = CatalogueServiceWeb(url) can't split the returned XML into ows.records to allow iterating be ows.records.items().

I guess that internally, the ows client needs to split XML based on namespsaces it knows, from ISO19115:2005, not anything else, like ISO19115-1:2014. Perhaps it can't find out the location of UUID in the record since that's changes in -1:2014.

A workaround is to use 'raw' XML splitting on ows.response like this:

root = etree.fromstring(csw.response)
records = root.findall('.//mdb:MD_Metadata', namespaces=namespaces)
  • Can the csw client fill csw.records for other outputSchemes that I'm not using?
  • Can the client be configured to know about the outputScheme, I just don;t know how?

nicholascar avatar Sep 12 '19 03:09 nicholascar

The solution should be based on the getCapabilities response. If interested I can propose a PR on this quite easily...

To have an Idea:

  1. I would remove or at least never use a statically defined Namespace list, target server may not implement most or some of them.
  2. I would query the target server saving the capabilities in the current state (see self.capabilities below) or returning it to the client.
  3. I would provide a dynamically generated dict of namespaces handled (by operation) f.e. GetRecordByID, GetRecords, etc.
  4. A client may be allowed to know which output schema (and format etc... is available for that specific format)
    def _get_output_schemas(self, operation):
        _cap_ns = self.capabilities.getroot().nsmap
        _ows_ns = _cap_ns.get('ows')
        if not _ows_ns:
            raise CswError('Bad getcapabilities response: OWS namespace not found '+str(_cap_ns))
        _op=self.capabilities.find("//{{{}}}Operation[@name='{}']".format(_ows_ns,operation))
        _schemas=_op.find("{{{}}}Parameter[@name='outputSchema']".format(_ows_ns))
        _values = map(lambda v: v.text, _schemas.findall("{{{}}}Value".format(_ows_ns)))
        output_schemas={}
        for key, value in _schemas.nsmap.items():
            if value in _values:
                output_schemas.update({key:value})
        return output_schemas

ccancellieri avatar Oct 22 '21 09:10 ccancellieri