geonode
geonode copied to clipboard
CSW search by keyword (`dc:subject`) is unsupported
Expected Behavior
You should be able to filter by dc:subject
(=keywords in pycsw and geonode) within a CSW query.
Actual Behavior
A simple query filtering by keyword returns "Invalid query syntax".
Steps to Reproduce the Problem
Make a GetRecords
request against the CSW endpoint:
curl -d "@payload.xml" -X POST "http://localhost/catalogue/csw?service=CSW&version=2.0.2"
With the following payload.xml
:
<csw:GetRecords
xmlns:csw="http://www.opengis.net/cat/csw/2.0.2"
xmlns:ogc="http://www.opengis.net/ogc"
xmlns:gml="http://www.opengis.net/gml"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:gmd="http://www.isotc211.org/2005/gmd"
xmlns:gco="http://www.isotc211.org/2005/gco"
xmlns:gmi="http://www.isotc211.org/2005/gmi"
xmlns:ows="http://www.opengis.net/ows" service="CSW" version="2.0.2" resultType="results" startPosition="1" maxRecords="4">
<csw:Query typeNames="csw:Record">
<csw:ElementSetName>full</csw:ElementSetName>
<csw:Constraint version="1.1.0">
<ogc:Filter>
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>dc:subject</ogc:PropertyName>
<ogc:Literal>something</ogc:Literal>
</ogc:PropertyIsEqualTo>
</ogc:Filter>
</csw:Constraint>
</csw:Query>
</csw:GetRecords>
Response:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- pycsw 2.6.1 -->
<ows:ExceptionReport xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:ows="http://www.opengis.net/ows" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2.0" language="en" xsi:schemaLocation="http://www.opengis.net/ows http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd">
<ows:Exception exceptionCode="InvalidParameterValue" locator="constraint">
<ows:ExceptionText>Invalid query syntax</ows:ExceptionText>
</ows:Exception>
</ows:ExceptionReport>
But the syntax is valid, since replacing dc:subject
by dc:title
works as expected.
This is the exception reported by pycsw:
Invalid query syntax. Query: {'type': 'filter', 'where': 'keyword_csv = %s', 'values': ['asd'], '_dict': {'ogc:Filter': {'ogc:PropertyIsEqualTo': {'ogc:PropertyName': 'dc:subject', 'ogc:Literal': 'asd'}}}}
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedColumn: column "keyword_csv" does not exist
LINE 1: ...et') AND (uuid IS NOT NULL AND id IN (1, 2)) AND (keyword_cs...
^
The above exception was the direct cause of the following exception:
Invalid query syntax. Query: {'type': 'filter', 'where': 'keyword_csv = %s', 'values': ['asd'], '_dict': {'ogc:Filter': {'ogc:PropertyIsEqualTo': {'ogc:PropertyName': 'dc:subject', 'ogc:Literal': 'asd'}}}}
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedColumn: column "keyword_csv" does not exist
LINE 1: ...et') AND (uuid IS NOT NULL AND id IN (1, 2)) AND (keyword_cs...
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pycsw/ogc/csw/csw2.py", line 821, in getrecords
matched, results = self.parent.repository.query(
File "/usr/src/geonode/./geonode/catalogue/backends/pycsw_plugin.py", line 165, in query
total = query.count()
File "/usr/local/lib/python3.10/dist-packages/django/db/models/query.py", line 412, in count
return self.query.get_count(using=self.db)
File "/usr/local/lib/python3.10/dist-packages/django/db/models/sql/query.py", line 528, in get_count
number = obj.get_aggregation(using, ['__count'])['__count']
File "/usr/local/lib/python3.10/dist-packages/django/db/models/sql/query.py", line 513, in get_aggregation
result = compiler.execute_sql(SINGLE)
File "/usr/local/lib/python3.10/dist-packages/django/db/models/sql/compiler.py", line 1175, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 66, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 79, in _execute
with self.db.wrap_database_errors:
File "/usr/local/lib/python3.10/dist-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.10/dist-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column "keyword_csv" does not exist
LINE 1: ...et') AND (uuid IS NOT NULL AND id IN (1, 2)) AND (keyword_cs...
Which makes sense since keyword_csv
is no table field but a model property:
https://github.com/GeoNode/geonode/blob/f3a490a2c2b38975351a887af720e7b7282b385e/geonode/base/models.py#L1339-L1348
Specifications
- GeoNode version: 4.1.0-dev
- Installation type (vanilla, geonode-project): vanilla
- Installation method (manual, docker): docker
- Platform:
- Additional details:
@fvicent as you already noticed the dc:subject
filter is not backed by a real keyword field, so this is a feature request. What is wrong is the returned error "invalid query syntax". "Unsupported query syntax" would probably be more appropriate.
Would you mind changing the title to "CSW search by keyword (dc:subject) is unsupported"?
@giohappy I am willing to work on this if you are ok with this addition. If I am right, this is somewhat similar to the csw:AnyText
case, whose corresponding field csw_anytext
gets populated whenever a dataset or document is saved, in order to support search via CSW. The same could be done for keywords. Or, maybe better, the repository could build a custom database query when dc:subject
is requested for search.
@fvicent that would be great! I would certainly go for the second option, i.e. manipulating the query executed by the PyCSW backend. I invite you to take a look at the CatalogueBackend since there are already parts related to keywords (and links).
I would move back the label from feature
to bug
since dc:subject
is one of the core CSW queriables.
Pls note that GeoNode declares dc:subject
as a queryable; having GeoNode/pyCSW not accepting such filter is indeed a bug.
Good point @etj, you're right.
I've managed to get this working with Postgres by tweaking the SQL filter (keyword_csv = %s
) generated by pycsw and executed in GeoNodeRepository.query()
:
index 7c951b235..d3eadf485 100644
--- a/geonode/catalogue/backends/pycsw_plugin.py
+++ b/geonode/catalogue/backends/pycsw_plugin.py
@@ -41,6 +41,15 @@ GEONODE_SERVICE_TYPES = {
"urn:x-esri:serviceType:ArcGIS:ImageServer": "ESRI:ArcGIS:ImageServer",
}
+SELECT_KEYWORD_CSV_QUERY = """\
+(SELECT STRING_AGG("base_hierarchicalkeyword"."slug", ',')
+FROM "base_taggedcontentitem"
+INNER JOIN "base_hierarchicalkeyword"
+ON ("base_taggedcontentitem"."tag_id" = "base_hierarchicalkeyword"."id")
+WHERE "base_taggedcontentitem"."content_object_id" = "base_resourcebase"."id"
+GROUP BY "base_resourcebase"."id")\
+"""
+
class GeoNodeRepository(Repository):
"""
@@ -156,6 +165,10 @@ class GeoNodeRepository(Repository):
pycsw_filters = settings.PYCSW.get("FILTER", {"resource_type__in": ["dataset"]})
if "where" in constraint: # GetRecords with constraint
+ constraint["where"] = (
+ constraint["where"]
+ .replace("keyword_csv", SELECT_KEYWORD_CSV_QUERY)
+ )
query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters)).extra(
where=[constraint["where"]], params=constraint["values"]
)
It would be much simpler to use Django's ORM instead of plain SQL, but having filters built and passed by pycsw as SQL code is a serious impediment. I was thinking in something like this, which unfortunately won't work:
index 7c951b235..c48eeb120 100644
--- a/geonode/catalogue/backends/pycsw_plugin.py
+++ b/geonode/catalogue/backends/pycsw_plugin.py
@@ -19,6 +19,7 @@
import logging
+from django.contrib.postgres.aggregates import StringAgg
from django.db import connection
from django.db.models import Max, Min, Count
from django.conf import settings
@@ -156,8 +157,19 @@ class GeoNodeRepository(Repository):
pycsw_filters = settings.PYCSW.get("FILTER", {"resource_type__in": ["dataset"]})
if "where" in constraint: # GetRecords with constraint
- query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters)).extra(
- where=[constraint["where"]], params=constraint["values"]
+ query = (
+ self._get_repo_filter(
+ ResourceBase.objects.filter(**pycsw_filters)
+ )
+ .annotate(
+ keyword_csv=StringAgg(
+ "keywords__slug",
+ delimiter=","
+ )
+ )
+ .extra(
+ where=[constraint["where"]], params=constraint["values"]
+ )
)
else: # GetRecords sans constraint
query = self._get_repo_filter(ResourceBase.objects.filter(**pycsw_filters))
It seems that annotated fields can't be used within .extra()
.
Another option would be to rewrite the query()
method and get rid of Django's ORM stuff, using plain SQL instead. Then building a query like this one would be much easier:
SELECT * FROM (
SELECT "base_resourcebase"."id",
"base_resourcebase"."title",
"base_resourcebase"."resource_type",
STRING_AGG("base_hierarchicalkeyword"."slug", ',') AS keyword_csv
FROM "base_resourcebase"
INNER JOIN "base_taggedcontentitem" ON ("base_resourcebase"."id" = "base_taggedcontentitem"."content_object_id")
INNER JOIN "base_hierarchicalkeyword" ON ("base_taggedcontentitem"."tag_id" = "base_hierarchicalkeyword"."id")
AND "base_resourcebase"."resource_type" = 'layer'
GROUP BY "base_resourcebase"."id"
) q
WHERE keyword_csv = 'something'; -- <--- Filters passed by pycsw
I couldn't manage to get Django's ORM generate an SQL alike.
@giohappy @etj Thoughts on this?