Add endpoint to get aggregate count for metadata fields

Open joelochlann opened this issue 5 months ago • 1 comments

How did we make this PR?

This was all generated by Copilot Agent mode with Claude Sonnet 4 during a focused group session with the Newsroom AI. Initial prompt was:

Currently searching for specific metadata fields is handled by something called chips, which require the user to type something specific in the search box. I'd like to turn these into explicit filters that are visible in the UI.

I'd like these to have the kibana functionality where they show a list of top values from the actual data, e.g. for "country" you might "UK", "USA", etc, based on the number of hits from a group by.

As a first step, I'd like to add a backend endpoint that gets the top n values with counts for the "country" field.

Does it work?

Yes! Copilot was actually able to re-use some existing (but possibly unused?) functionality, and created a test which exercises the new size param which it added. The test does not actually exercise the new endpoint itself, just the function used by the endpoint.

You can also verify yourself with

https://api.media.local.dev-gutools.co.uk/images/aggregations/metadata/keywords?size=5

Is it perfect!

No! Flaws we know about:

It will allow requests to non-existent metadata fields, and instead of erroring, just give a zero count for values. (Maybe this is desired behaviour? Not sure)
It will not work for fields that do not have an associated keyword mapping type. This is an elasticsearch limitation

Copilot documentation

On our request, Copilot produced the following documentation.

It has a few mistakes, namely:

localhost curl requests won't work because of auth (you need *.media.local.dev-gutools.co.uk)
country will not work, because it doesn't have a keyword field type

Overview

This new endpoint provides the top N values with counts for metadata fields, enabling Kibana-style filtering functionality. It's the first step towards replacing the chips-based search with explicit UI filters.

Endpoint

GET /images/aggregations/metadata/{field}?size={size}&q={query}

Parameters

field (required): The metadata field to aggregate on (e.g., "country", "city", "credit", etc.)
size (optional, default: 10): Maximum number of top values to return
q (optional): Additional query to filter the data before aggregation

Example Usage

Get top 5 countries from all images:

curl "http://localhost:9001/images/aggregations/metadata/keyword?size=5"

Get top countries from images with "london" in any field:

curl "http://localhost:9001/images/aggregations/metadata/keyword?q=london&size=5"

Example Response

{
  "data": [
    {
      "key": "UK",
      "count": 1250
    },
    {
      "key": "USA", 
      "count": 890
    },
    {
      "key": "France",
      "count": 456
    },
    {
      "key": "Germany",
      "count": 234
    },
    {
      "key": "Spain",
      "count": 123
    }
  ],
  "offset": 0,
  "total": 5
}

Supported Fields

All metadata fields are supported, including:

country, city, state, subLocation
credit, source, supplier
byline, photographer
keywords, subjects
title, description
imageType
And more...

Technical Implementation

Uses Elasticsearch terms aggregation for efficient counting
Leverages existing metadataSearch functionality in ElasticSearch class
Supports structured query filtering via the q parameter
Returns results in the standard Grid API format

Next Steps

This endpoint will be used to build explicit filter UI components that show:

Available filter values with counts
Dynamic filtering based on current search context
Multi-select filtering capabilities
Clear visual indication of applied filters

This will replace the current chips-based system where users need to know specific syntax like country:UK.

Aug 01 '25 15:08 joelochlann

Deploy build 13418 of `media-service::grid::all` to TEST

All deployment options

From guardian/actions-riff-raff.

Aug 01 '25 15:08 github-actions[bot]

Add endpoint to get aggregate count for metadata fields

How did we make this PR?

Does it work?

Is it perfect!

Copilot documentation

Overview

Endpoint

Parameters

Example Usage

Example Response

Supported Fields

Technical Implementation

Next Steps

Deploy build 13418 of media-service::grid::all to TEST

Deploy build 13418 of `media-service::grid::all` to TEST