cms icon indicating copy to clipboard operation
cms copied to clipboard

Search hydrates all results before applying limit causing slow results on large sites + suggestions

Open jonathan-bird opened this issue 1 month ago • 4 comments

Bug description

When investigating why my default.json search index which is 1.6MB was taking 3-4 seconds to load even with a simple:

return \Statamic\Facades\Search::index($index)
  ->ensureExists()
  ->search($query)
  ->limit($limit)
  ->get();

I got digging and realised that getBaseItems in QueryBuilder hydrates all results and transforms them before applying the limit. This was crazy to me because the lookup() returns results already sorted by score (highest first). So when we do ->take(20), we get the top 20 most relevant results.

I came up with this (rough-ish) solution which got my search results down to roughly 200ms - not great but much better than 3-4s:

// Get the search index and perform raw lookup (fast - searches JSON file)
$searchIndex = StatamicSearchFacade::index($index);
$searchIndex->ensureExists();

$rawResults = $searchIndex->lookup($query);

if ($rawResults->isEmpty()) {
    return collect();
}

// Limit results BEFORE hydration to avoid loading hundreds of entries
$limitedResults = $rawResults->take($limit);

// Extract entry IDs from the limited results
$ids = $limitedResults
  ->map(fn ($result) => Str::after($result['reference'], 'entry::'))
  ->values()
  ->all();

// Hydrate only the entries we need
$entries = Entry::query()->whereIn('id', $ids)->get()->keyBy->id();

// Return entries in the same order as search results (by relevance score)
return collect($ids)
  ->map(fn ($id) => $entries->get($id))
  ->filter();

Then I got digging more and noticed in the IteratorBuilder the getFilteredItems always hydrates all results for everything too (entries etc). I understand due to being flat file, you have to do this in case it has orderBy or no limit, but you could potentially improve it to be smarter via batching if there were limit + where without order by. This would instantly fix a lot of search:

protected function getFilteredItems()
{
  // No limit or has orderBy - can't optimize, need all items
  if (!$this->limit || $this->orderBys) {
    $items = $this->getBaseItems();
    return $this->filterWheres($items);
  }

  // No wheres - just limit base items directly
  if (empty($this->wheres)) {
    return $this->getBaseItems()->slice($this->offset ?? 0, $this->limit);
  }

  // Has limit AND wheres - batch hydrate until we have enough
  return $this->getFilteredItemsInBatches();
}

This would be much faster on large sites on many queries while still supporting the current approach.

Let me know your thoughts - I'd be happy to give a PR a go, just wanted to get your thoughts on the issue before I went down a rabbit hole.

How to reproduce

return \Statamic\Facades\Search::index($index)
  ->ensureExists()
  ->search($query)
  ->limit($limit)
  ->get();

You'd need ~1000 basic entries with title + content to see that .

Environment

Environment
Application Name: xxxx
Laravel Version: 12.40.2
PHP Version: 8.4.6
Composer Version: 2.8.4
Environment: production
Debug Mode: OFF
URL: xxxx
Maintenance Mode: OFF
Timezone: Australia/Brisbane
Locale: en

Cache
Config: CACHED
Events: CACHED
Routes: CACHED
Views: CACHED

Drivers
Broadcasting: log
Cache: redis
Database: mysql
Logs: stack / daily
Mail: smtp
Queue: redis
Session: redis

Storage
public/storage: NOT LINKED

Livewire
Livewire: v3.7.0

Statamic
Addons: 7
Sites: 1
Stache Watcher: Disabled (auto)
Static Caching: full
Version: 5.69.0 PRO

Statamic Addons
aerni/advanced-seo: 2.13.2
aerni/livewire-forms: 9.10.1
digital-bird/statamic-toc: 1.0.2
marcorieser/statamic-livewire: 4.5.3
mitydigital/sitemapamic: 3.4.1
mitydigital/statamic-two-factor: 2.7.3

Installation

Fresh statamic/statamic site via CLI

Additional details

No response

jonathan-bird avatar Nov 29 '25 07:11 jonathan-bird

A PR would be great!

jasonvarga avatar Nov 29 '25 22:11 jasonvarga

@jasonvarga happy to, do you want me to implement it for search only, or across getFilteredItems affecting all queries in the QueryBuilder? I think we try all queries, just not sure if I'm breaking anything 😬

jonathan-bird avatar Nov 29 '25 23:11 jonathan-bird

@jasonvarga I have created a PR, great results in my benchmark tests between old vs new: https://github.com/statamic/cms/pull/13217

Please let me know if you'd like any changes, this is my first PR to the repo so trying to follow your standards and approaches as best I could.

jonathan-bird avatar Nov 30 '25 03:11 jonathan-bird

Great looking PR - we'll review it when we can. Thanks!

jasonvarga avatar Nov 30 '25 20:11 jasonvarga