timesketch icon indicating copy to clipboard operation
timesketch copied to clipboard

Improve performance of `TimelineFieldsResource` endpoint

Open jkppr opened this issue 4 months ago • 0 comments

In PR #3182 we introduced a new API endpoint TimelineFieldsResource to gather unique field names per timeline to improve the UX of the visualizations dialog.

https://github.com/google/timesketch/pull/3182/files#diff-bb4334f2502d8d96c5689e099dd1440470e425b43a01d8221ebfaabd18511862R538

Currently, the endpoint retrieves the unique fields by:

  1. Aggregating all data types within the timeline.
  2. For each data type, querying for a single event to get the fields.

This approach can be inefficient, especially for timelines with many data types.

This issue is to track research into a more efficient way of gathering those fields. Possible approaches to investigate include:

  • Using OpenSearch field capabilities: Explore if OpenSearch provides a more direct way to retrieve unique fields for a given index and data type.
  • Caching field information: Investigate caching the field information to avoid redundant queries.
  • Optimizing the aggregation: Analyze if the aggregation step can be optimized or if it can be avoided altogether.

Tasks:

  • [ ] Research alternative approaches for retrieving unique fields.
  • [ ] Benchmark the performance of different approaches.
  • [ ] Implement the most performant solution.

Related PRs:

  • #3182

jkppr avatar Oct 09 '24 15:10 jkppr