manticoresearch
manticoresearch copied to clipboard
search_after feature - pagination
Hello guys, Is there any "search_after" feature like the one from elasticsearch? I want to paginate over millions of documents, but you know, the LIMIT 10,10 from sql is not great idea for a big database. I don't know how manticore sort rows different than mysql, but for mysql LIMIT just doesn't work. In elastic i use "search_after" to avoid querying all the previous "rows", and it works like a charm. This feature is the only thing i need to move away from elastic. Cause i really like what you guys did here. An alternative would be lt: last identifier to reproduce the search_after, but will it really work as expected / like in elastic in terms of treating the previous rows? Didn't find anything related in the docs. Thanks.
no we do not have such feature
no we do not have such feature
yeah i already got it. i tried to paginate over a big database, the query was sort by multiple fields, but i wasn't able to make it work 100% as expected on manticore unfortunately. tried with where last sorted on lt | gt operators to simulate search_after... but it don't returned the results as it should when paginating back or too far
in fact, manticore was like 2x or 10x faster for me than elasticsearch. and using like 1gb of ram(elastic was using 14gb). but this problem with the multi-sort pagination stuck me with elastic... for now, doesn't seem to exist a right way to paginate in manticore, specially on multi sorts, i hope they will think about it in the future.
- https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html#search-after
@tzukav are you using just search_after
or PIT too in Elasticsearch?
Can you also elaborate ore on:
tried with where last sorted on lt | gt operators to simulate search_after... but it don't returned the results as it should when paginating back or too far
In theory if I understand how search_after
works, emulating it should work unless you are using PIT in Elasticsearch.
@sanikolaev only search_after. elastic said that search_after is better than PIT.
I think the real problem was due to a float comparison.
I had something like ORDER BY id DESC, price DESC.
price is a float(ex 1.01) in many cases.
I was doing something like id < lastid AND price <= last_price to simulate search_after, but somehow it wasn't returning the same results as elastic, same problem for reverse(prev) pagination and next pages. Like going next may return some results(not the good ones, but it was returning something at least) but going on page next again or prev, with the same logic as elastic, it wasn't returning none.
maybe I encountered this error?
The project is already launched with elastic, unfortunately, there's not a coming back. Already did a good structure with elastic oop client and I just can't redo the entire project. You can close this topic if you want to.
*id was just an example, i don't use the default id from elastic, i use another unique integer index starting from 1, like the mysql autoincrement, but i increment it by myself when inserting
*and the sort was ['price' => 'desc', 'nr' => 'desc'] (nr is unique, integer ++ like mysql autoincrement(no missing numbers)), the query should be price <= last price AND nr < last nr
elastic said that search_after is better than PIT.
Unless you use both:
Repeat this process by updating the search_after array every time you retrieve a new page of results. If a refresh occurs between these requests, the order of your results may change, causing inconsistent results across pages. To prevent this, you can create a point in time (PIT) to preserve the current index state over your searches.
I think the real problem was due to a float comparison.
I believe the logic is just more sophisticated than id < lastid AND price <= last_price
. If you do order by id desc, price desc limit X
, then given N
, M
are the last doc's id
/price
values and the id
is non-unique, the next query should be where id <= N order by id desc, price desc limit Y
, then N
, M
should be excluded from the top since you want < M
, but only for N
, then if Y
was too little and you need more, you may need another query like where id < N
.
Another question is what to do when you sort by A
, B
and you have multiple docs with the same A
, Bpairs in the end and have to stop in the middle of the set due to the
size/
limit`. I'm not sure how Elasticsearch deals with it, but to me it seems there are only 2 options:
- disregard
size
and return all the results until all the docs with the last doc's A, B end. But then the user has to handle this situation. - return not just A, B in
sort
, but the position it stopped at A, B, then this position can be used inseerch_after
. This is not the case according to the docs.
So I'm not sure how it's done, probably I'm missing smth.
The project is already launched with elastic, unfortunately, there's not a coming back. Already did a good structure with elastic oop client and I just can't redo the entire project.
Sure, no problem at all.
You can close this topic if you want to.
The feature request still makes perfect sense.