Duplicated search results.
Hi,
Looks like something (I'm guessing maybe on the side of search.nixos.org elasticsearch) has changed, and now nix-search-cli is displaying each result twice:
> nix run nixpkgs#nix-search-cli -- hello
hello @ 2.12.1 : hello
hello @ 2.12.1 : hello
hello-go : hello-go
hello-go : hello-go
hello-cpp : hello-cpp
hello-cpp : hello-cpp
hello-unfree @ 1.0
hello-unfree @ 1.0
....
Darn, I'm seeing this as well. Thanks for reporting the issue. Want to send a PR that does the deduplication?
I've been digging into the source to find the root cause, and I believe I have found the problem, however I'm a bit unsure on how to properly fix it.
Here's how I investigated the issue, I added a couple of fmt.Println at esclient.go, printed the Elasticsearch URL as well as the query json. And also printed the response json. Then I searched for the same package hello at search.nixos.org and compared both URLs, requests and responses, here's what I found:
The query json looks well, the one from search.nixos.org frontend is the same except for some aggs used to group results but we dont need that on the cli.
One thing I noticed different was the request URLs:
CLI = https://nixos-search-7-1733963800.us-east-1.bonsaisearch.net:443/latest-*-nixos-unstable/_search
WEB = https://search.nixos.org/backend/latest-43-nixos-24.11/_search
Then, trying to query the very same endpoint, I changed
ElasticSearchURLTemplate = `https://search.nixos.org/backend/%s/_search`
but still got duplicated results. noticed that our URL has latest-*-nixos-unstable so, tried using --channel 24.11 on the CLI just trying to hit the same URL.
Now, the problem is actually the -*- prefix here:
ElasticSearchIndexPrefix = "latest-*-"
I'm guessing there are two indexes that return results, and thats why we are getting duplicates.
So, for one thing, I believe we should be hitting https://search.nixos.org/backend/, however I'm not really sure how to proceed regarding the latest-*- index prefix, since now having the -*- wildcard matches two indexes and both return results.
Any suggestions on how to work around it ?
The 43 value ( latest-43- ) used by the search.nixos.org frontend is read from the environment as elasticsearchMappingSchemaVersion,
https://github.com/search?q=repo%3ANixOS%2Fnixos-search%20elasticsearchMappingSchemaVersion&type=code
so I'm guessing they bump that version number whenever they add new indexed fields or something like that and the schema changes.
I believe we should also use a fixed schemaVersion int in our Index-prefix, instead of using a wildcard -*-. Or maybe we could read it from a file in our repo containing that schemaVersion that gets updated from time to time ?
This is it, the file where 43 is defined on their frontend:
https://github.com/NixOS/nixos-search/blob/main/VERSION (changed to 43 two days ago, precisely when we started getting dups)
we could download that file contents as part of our nix build I guess. what do you think ?
Pushed a minimal PR that fixes this issue: https://github.com/peterldowns/nix-search-cli/pull/20
It uses a fixed schemaVersion, and now results are back to normal.
If you prefer the schemaVersion not to be hardcoded (anyways we have user/pwd hardcoded in there), tell me so.
Grepping for latest-*-nixos-unstable aliases, matches twice once for schemaVersion 42 and one for schemaVersion 43. So I guess whenever we get a new schema version another latest- alias will be created for it.
curl https://search.nixos.org/backend/_aliases -u "$esUser:$esPass" | jq | rg latest-
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 28119 0 28119 0 0 89414 0 --:--:-- --:--:-- --:--:-- 89550
"latest-43-nixos-unstable": {}
"latest-42-group-manual": {}
"latest-42-nixos-unstable": {}
"latest-42-nixos-24.11": {}
"latest-43-nixos-24.11": {}
"latest-43-group-manual": {}
ping @peterldowns
Update: Today (June 4, 2025) things started working again. There's a single -*-unstable alias now: latest-43-nixos-unstable, looks like past aliases get removed after some time.
@vic hey, yup, you figured things out — the latest-* alias prefix is used to always search over the latest available index, but upstream sometimes has multiple latest- indices, and we get duplicate search results. This is documented in the code here https://github.com/peterldowns/nix-search-cli/blob/7d6b4c501ee448dc2e5c123aa4c6d9db44a6dd12/pkg/nixsearch/esclient.go#L21 but not anywhere else, sorry that I didn't have time to point you to it.
If you prefer the schemaVersion not to be hardcoded (anyways we have user/pwd hardcoded in there), tell me so.
Hardcoding the schemaVersion (or the search index) is not a viable option because it would require users of nix-search-cli to re-build or download a new binary everytime the upstream index updates.
The best two options:
- update the nix-search elasticsearch indexer script to consistently update a single versionless alias, like
latest-unstable, to keep it pointing to whatever the actual latest index is. - update this repo's code to deduplicate results by package name.
Fixed by https://github.com/peterldowns/nix-search-cli/pull/21