Add endpoint to get aggregate count for metadata fields
How did we make this PR?
This was all generated by Copilot Agent mode with Claude Sonnet 4 during a focused group session with the Newsroom AI. Initial prompt was:
Currently searching for specific metadata fields is handled by something called chips, which require the user to type something specific in the search box. I'd like to turn these into explicit filters that are visible in the UI.
I'd like these to have the kibana functionality where they show a list of top values from the actual data, e.g. for "country" you might "UK", "USA", etc, based on the number of hits from a group by.
As a first step, I'd like to add a backend endpoint that gets the top n values with counts for the "country" field.
Does it work?
Yes! Copilot was actually able to re-use some existing (but possibly unused?) functionality, and created a test which exercises the new size param which it added. The test does not actually exercise the new endpoint itself, just the function used by the endpoint.
You can also verify yourself with
https://api.media.local.dev-gutools.co.uk/images/aggregations/metadata/keywords?size=5
Is it perfect!
No! Flaws we know about:
- It will allow requests to non-existent metadata fields, and instead of erroring, just give a zero count for values. (Maybe this is desired behaviour? Not sure)
- It will not work for fields that do not have an associated
keywordmapping type. This is an elasticsearch limitation
Copilot documentation
On our request, Copilot produced the following documentation.
It has a few mistakes, namely:
-
localhostcurl requests won't work because of auth (you need*.media.local.dev-gutools.co.uk) -
countrywill not work, because it doesn't have akeywordfield type
Overview
This new endpoint provides the top N values with counts for metadata fields, enabling Kibana-style filtering functionality. It's the first step towards replacing the chips-based search with explicit UI filters.
Endpoint
GET /images/aggregations/metadata/{field}?size={size}&q={query}
Parameters
-
field(required): The metadata field to aggregate on (e.g., "country", "city", "credit", etc.) -
size(optional, default: 10): Maximum number of top values to return -
q(optional): Additional query to filter the data before aggregation
Example Usage
Get top 5 countries from all images:
curl "http://localhost:9001/images/aggregations/metadata/keyword?size=5"
Get top countries from images with "london" in any field:
curl "http://localhost:9001/images/aggregations/metadata/keyword?q=london&size=5"
Example Response
{
"data": [
{
"key": "UK",
"count": 1250
},
{
"key": "USA",
"count": 890
},
{
"key": "France",
"count": 456
},
{
"key": "Germany",
"count": 234
},
{
"key": "Spain",
"count": 123
}
],
"offset": 0,
"total": 5
}
Supported Fields
All metadata fields are supported, including:
-
country,city,state,subLocation -
credit,source,supplier -
byline,photographer -
keywords,subjects -
title,description -
imageType - And more...
Technical Implementation
- Uses Elasticsearch terms aggregation for efficient counting
- Leverages existing
metadataSearchfunctionality inElasticSearchclass - Supports structured query filtering via the
qparameter - Returns results in the standard Grid API format
Next Steps
This endpoint will be used to build explicit filter UI components that show:
- Available filter values with counts
- Dynamic filtering based on current search context
- Multi-select filtering capabilities
- Clear visual indication of applied filters
This will replace the current chips-based system where users need to know specific syntax like country:UK.