graylog2-server Improve handling of long running search queries on search/dashboards.

Why?

Searches/dashboards querying large amounts of data can put significant load on the OpenSearch cluster. The current way of retrieving search results for these in the frontend is an all-or-nothing approach, where all of the data for the current search/dashboard tab is retrieved and results are shown only when it has completed. This leads to:

Users having to wait for the widget with the maximum query time until any result is displayed
Users having the impression the search is "stuck", retriggering another one, making the situation worse

What?

For the reasons above, we should take care that:

The wait time for a user is minimized, by:
- Considering caching for long time ranges (within the margin of a time proportional to the length of the time range queried, e.g. a relative time range for last 7 days -> now returns results from cache if queried within five minutes)
- Considering returning partial results (widgets finishing early will already be shown) through asynchronous search
The amount of executed searches is minimized, by:
- Making sure that searches are cancelled, when the users triggers another search while the current one is running
- Making sure that searches are cancelled when they are taking too long (cancel_after_time_interval)
The amount of data being queried is minimized, by:
- Only retrieving results and rendering widgets which are currently in viewport, lazy-loading the rest
Preventing the user from refreshing the page by providing feedback when searches take longer, so they know it is not stuck
Any other ideas for the points above

Your Environment

Graylog Version:
OpenSearch Version:
MongoDB Version:
Operating System:
Browser version:

Some further investigations and decisions can be found here: https://graylogdocumentation.atlassian.net/wiki/spaces/SEARCH/pages/3138191452/Long+running+search+handling

Feb 01 '24 09:02 dennisoelkers

Another option worth considering, found by Jan, already considered in data-node : #18590

Mar 13 '24 08:03 luk-kaminski

Lets have a talk on how we want to proceed with the remaining ideas.

May 03 '24 09:05 kmerz

We implemented:

 - The amount of executed searches is minimized, by:
    - Making sure that searches are cancelled, when the users triggers another search while the current one is running
    - Making sure that searches are cancelled when they are taking too long (`cancel_after_time_interval`)
- Preventing the user from refreshing the page by providing feedback when searches take longer, so they know it is not stuck

We considered these the most rewarding points. From now on, we will wait for future feedback to see if we need to implement further measures. Therefore, and because we would need to reformulate future steps, I am closing this issue as completed.

May 06 '24 09:05 dennisoelkers

graylog2-server graylog2-server copied to clipboard

Improve handling of long running search queries on search/dashboards.

Why?

What?

Your Environment

graylog2-server
graylog2-server copied to clipboard