chef-server Sorting not working for search API

Sorting not working for search API

Open alexpop opened this issue 9 years ago • 21 comments

The "sort" parameter in the search API is not working. Also reported in CHEF-2121.

This is causing Chef Manage(opscode/chef#2279) and 'knife search' commands to display unsorted lists. Here are some "knife search" results to show the issue:

[apop@mymac chef-repo]$ knife search role "*:*" --id-only --sort asc -VV
...
DEBUG: Initiating GET to https://api.opscode.com/organizations/ap-org1/search/role?q=*%253A*&sort=asc&start=0&rows=1000
...
3 items found

windows_web
linux_base
windows_base

Same unsorted list returned with these commands:

knife search role "*:*" --id-only --sort name
knife search role "*:*" --id-only --sort name+desc
knife search role "*:*" --id-only --sort "name desc"
knife search role "*:*" --id-only --sort ascending
knife search role "*:*" --id-only --sort asc
knife search role "*:*" --sort asc
knife search role "*:*" --sort description

Oct 28 '14 14:10 alexpop

Earlier version of this https://tickets.corp.opscode.com/browse/OC-11238 Still appears as late as Manage 1.7.1

Dec 11 '14 23:12 sean-horn

@sean-horn the version of Manage used should not be an issue. Manage will not implement sorting on the client side until it's working on the server. Ideally we would implement these together at the same time.

Dec 12 '14 05:12 smith

In combination with https://github.com/chef/chef-manage-issues/issues/12 this is the number one customer Feature/Fix request.

Jan 30 '15 20:01 sean-horn

At this point, we're going to have to classify this as a feature and not a bug. The server-side sorting feature has been non-functional since sometime in 2010.

When this feature was working, the Chef Server was storing the flattened-and-expanded node object data as individual fields within Solr. The number of fields in Solr quickly expanded to over a milliion based on all the key combinations of all the nodes on the server, and search times suffered drastically. Solr behaves best when the number of fields is kept low (at the time benchmark comparisons were going up to 32 fields).

To improve (really, unbreak) the search performance on large Chef Server installs, we combined all of the flatten-and-expanded node keys into a content field in Solr. We do special search query transformation to convert a key:val Lucene query to something along the lines of content:__key__%SEP%__val__ before sending queries to Solr.

Since all of the node-related keys are merged into the content field in Solr, we can't actually tell Solr what fields to sort on when returning query results.

Without significant changes and / or new APIs in the Chef Server, we have two options (neither of them particularly great) for obtaining sorted search results: server-side and client-side:

server-side: With the architecture described above, to obtain sorted results from the server, we'd have to first query Solr for all of the results matching a query, obtain the gzipped node JSON from PostgreSQL for all the results, parse the JSON, and sort based on the arbitrary key passed in by the user.

This would be fine for small result sets, but for large installs this quickly balloons the memory usage of the Chef Server.

client-side: All of the same memory usage issues apply here, but you also are hindered by the fact that you'd have to stream all of the data to the client in order to sort, meaning it would be infeasible to combine sorting with limits.

Apr 30 '15 23:04 sdelano

Perhaps we should open an issue with the client to remove the sort parameter in the DSL methods and the options in the CLI tool, so people don't get confused.

Apr 30 '15 23:04 stevendanna

:+1: to that, @stevendanna. Also paging @smith to this thread.

Apr 30 '15 23:04 charlesjohnson

Without significant changes and / or new APIs in the Chef Server

I'd be interested to hear what kind of things we're talking about here. Since we're talking about the way the data is stored and queried, "significant" sounds pretty significant.

server-side:
With the architecture described above, to obtain sorted results from the server, we'd have to first query Solr for all of the results matching a query, obtain the gzipped node JSON from PostgreSQL for all the results, parse the JSON, and sort based on the arbitrary key passed in by the user.

This would be fine for small result sets, but for large installs this quickly balloons the memory usage of the Chef Server.

Would we be able to mitigate or make feasible any of this by exploring some kind of SAX-like JSON streaming parser (to keep the memory usage low) or some of the postgres functions for manipulating JSON on the way out of the server?

client-side:
All of the same memory usage issues apply here, but you also are hindered by the fact that you'd have to stream all of the data to the client in order to sort, meaning it would be infeasible to combine sorting with limits.

We could figure out ways to allow sorting only after all rows had been loaded in memory, but that wouldn't survive a page refresh and would present tricky UI design challenges.

Perhaps we should open an issue with the client to remove the sort parameter in the DSL methods and the options in the CLI tool, so people don't get confused.

I'm ok with that, but as long as there are search results displayed in a table with headings, users will click the headings and want it to sort when the headings are clicked, because that's what people expect. They will not stop asking for this feature so long as things are in a table with headings.

Would it be possible to have some kind of preconfigured white-list of key/paths where we make sorting possible and only allow those (configured in chef-server.rb)? Just throwing out ideas at this point.

I'm open to not supporting sorting, but at this point I think the user who is caused the most pain by this might be support, since they have to keep fielding requests about it.

May 01 '15 02:05 smith

In 2015, unsorted lists are not acceptable in my opinion. Look at the products we use on a daily basics, how many give us unsorted lists? It's one of those hard right things we need to prioritize and implement.

May 01 '15 10:05 alexpop

If we limited what you could sort on, could we potentially use PostgreSQL cursors (http://www.postgresql.org/docs/9.2/static/plpgsql-cursors.html) to do sorted, paginated results without blowing up memory?

May 01 '15 20:05 stevendanna

I don't really know how much relevance it has to a realistic solution to this problem, but I'd like to note that https://github.com/jcreedcmu/psql-gunzip-test is a proof-of-concept that does allow ordering-by and postgresql-indexing on the json fields that are stored gzipped in serialized_object columns. e.g. I can get SELECT name FROM nodes ORDER BY (gunzip(serialized_object)::json->'normal'->>'foo'); and CREATE INDEX ON nodes ((gunzip(serialized_object)::json->'normal'->>'foo')); to work on a ubuntu 12.04 vm after doing a chef server omnibus build with postgresql upgraded from 9.2 to 9.3 so that the JSON access functions exist.

May 05 '15 17:05 jcreedcmu

Storing the json data gzipped is kind of silly anyway, because PostgreSQL will already compress column values automatically.

May 05 '15 19:05 petere

@petere I think I was just reading about this yesterday. Does this page describe the compression you are talking about?

http://www.postgresql.org/docs/9.2/static/storage-toast.html

May 05 '15 19:05 jeremiahsnapp

@petere Very much agree.

The compression is an artifact from when we supported mysql as well. We wanted to maintain a sql model containing the least common denominator between the two, and it was easier at the time to own the compression ourselves than to model the differences between how postgres and mysql managed things. It doesn't make much sense now.

Going forward we should stop compressing things on the erlang side; it doesn't add any value and closes off cool things like JSONB.

May 05 '15 19:05 markan

I opened up #225 to separately address erlang-side compression, since that can be done independently of whatever solution we arrive at here.

May 05 '15 19:05 marcparadise

Related question: Putting search results aside, what's the difficulty of sorting results on the /nodes, /cookbooks, /roles, /environments, etc. endpoints?

May 11 '15 16:05 charlesjohnson

@charlesjohnson since they only return ids (aside from cookbooks, which have the most recent version I think), I'm guessing you would just need to add an ORDER BY and possibly an index along with it. That's just my guess though.

May 11 '15 17:05 smith

+1 for this bug. As a Chef user this unsorted lists issue is my one gripe with the product. Would really love a solution for this guys!

May 14 '15 13:05 wps-carl

What does it even mean to sort the results server-side on, e.g. /nodes, if what's being returned is unconditionally a JSON map of all the nodes? I thought the issue of sorting was meaningful for the search endpoint because it's paginated.

Jun 23 '15 12:06 jcreedcmu

is there any way I can help to move this forward?

Sep 23 '15 09:09 Tyrael

Any work around for this? Its very painful to search for every role you want to add.

Mar 03 '17 00:03 jquick

Please make the chef server web UI lists sortable

Oct 19 '17 14:10 spotlesscoder

chef-server chef-server copied to clipboard

Sorting not working for search API

chef-server
chef-server copied to clipboard