dynamoid icon indicating copy to clipboard operation
dynamoid copied to clipboard

indexing and ordering/sorting results?

Open rickybrown opened this issue 7 years ago • 15 comments

Hey there, forgive me for not totally getting the docs or if i've overlooked, but how would I go about indexing an attribute and ordering/sorting results by that attribute? Lets say I have a simple model for posts, with only a couple attributes:

class Post
  include Dynamoid::Document

  field :content
  field :votes, :integer, { default: 0 }
end

And I want to index the :votes attribute for sorting/ordering by number of votes, or the :content attribute alphabetically, or by the created_at timestamp. Ex: @posts.order('created_at DESC')

Could you give me some clarity? Also, do we need support for "Local secondary index's" as well as "Global Secondary Index's?"

rickybrown avatar Sep 20 '17 18:09 rickybrown

DynamoDB doesn't do general sorting without using what is known as the Range key since the general structure is a Key Value Store.

Quick Overview of Sorting

Thus, your sorting is based on items with the same HashKey. So for example we could have this as a Topics table:

Forum Topic PostCount
General Hello World 10
General For Your Information 25
General Vote for New Moderator 13
Health and Science The New Cure for Allergies 10
Health and Science Gluten, is it good for you? 3
Technology The New iPhone X 142

If we have Forum as our HashKey and PostCount as our Range key for the entire table then we can get the "topics" sorted in each forum by post count:

Topics.where(forum: "General").order('post_count DESC')

This will return your topics in descending order in the General Forum. And in this case we don't need a Secondary index because our base table has the Range key.


Post Example

So now for your post example, this is a bit harder, I'm not sure what your HashKey would be, but in order to sort, you need a HashKey of the same value across your various rows.

The general Post example would likely be:

Id Content Votes
1 Top 10 Ruby Gems You Should Use 10
2 The Best Tacos in Town 3
3 Follow These Users For Maximum Success 13

In this case, our HashKey might be Id and the Votes is our Range Key, however, Range is sorted PER HashKey which means, you actually can't query this table and get your posts sorted by votes with this schema.

Instead you would have to add maybe "Site" to your schema and so each Post corresponds to a particular Site.

Id Content Votes Site
1 Top 10 Ruby Gems You Should Use 10 My Blog
2 The Best Tacos in Town 3 My Blog
3 Follow These Users For Maximum Success 13 My Blog
4 What Happens Next Will Surprise You 10 Jerry's Blog
5 Top 10 Conspiracy Theories With Rick 3 Jerry's Blog

In this schema, we now can see we can get our posts sorted by "votes" by having the HashKey be Site and we'd do:

Post.where(site: "My Blog").order("votes DESC")

Again, this HashKey part is important!

Posts Multiple Sorts

Building off this example, since our HashKey is Site we can't have multiple range keys on the table. You can create Local Secondary Indices (which is basically adding more range keys to the table :P) so our schema could be:

Type Name HashKey RangeKey
Table posts Site Votes
Local Secondary Index content_index Content
Local Secondary Index id_index Id
Local Secondary Index created_at_index CreatedAt

By doing this we can then do queries like:

Post.where(site: "My Blog").order("content DESC")
Post.where(site: "My Blog").order("id DESC")
Post.where(site: "My Blog").order("created_at DESC")

Again, I must repeat, you MUST have same HashKey in order to sort, thus we cannot ask "Give me ALL POSTS sorted by Votes" in this scheme without introducing a new attribute or saying having all Site values be the exact same across all rows (Hope that makes sense)

Local Secondary Indices vs Global Secondary Indices

So the difference is that Local Secondary Indices:

  1. Must be created when table is created
  2. Must share the same hash key that your table has
  3. Must identify the range key to use

Whereas a Global Secondary Index may have a different Hash Key and a different range key.

However, be careful because no partition (determined by HashKey) can have more than 10GB of data (see http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html#LSI.ItemCollections.SizeLimit) thus in the example here, all rows with Site as "My Blog" will end up same partition due to same HashKey and all these rows must be contained within 10 GB of space.


Hope this has some clarification for you, let me know if you have any follow-up questions or need clarification on the above examples.

richardhsu avatar Sep 20 '17 20:09 richardhsu

Thanks for writing that up @richardhsu ! Very insightful. I have made this into a basic intro wiki.

pboling avatar Sep 20 '17 20:09 pboling

Awesome @richardhsu, much appreciated!

rickybrown avatar Sep 20 '17 21:09 rickybrown

Why we don't have order method anymore?

rasouza avatar Jul 04 '18 18:07 rasouza

@rasouza Can you post a code snippet? Are you referring to current master HEAD? @andrykonchin Did this change?

pboling avatar Jul 04 '18 22:07 pboling

Not sure if there was any method like order, at least since 2016.

There is only scan_index_forward method (that means ScanIndexForward Query option)

andrykonchin avatar Jul 05 '18 08:07 andrykonchin

@andrykonchin The documentation for how to do sorting is based heavily on an order method.

Perhaps it was never right?

https://github.com/Dynamoid/Dynamoid/issues/193#issuecomment-330973868

and

https://github.com/Dynamoid/Dynamoid/wiki

@richardhsu can you shed any light on this?

pboling avatar Jul 06 '18 11:07 pboling

Perhaps it was never right?

@pboling Looks like yes

andrykonchin avatar Jul 09 '18 07:07 andrykonchin

@rasouza @pboling @andrykonchin Hmm this is very strange, I must have had a mind blip because I can't seem to find order method either and going back through my DynamoDB projects, I never used order, but was doing things through scan_index_forward. My apologies for inaccurate documentation though I think the usage of how order should be is correct though and the requirements on secondary indexes still holds true.

We can implement this at least based on such but only works for Query operations and not Scan as mentioned by @andrykonchin so that instead of doing something like Post.where(forum: 'General').scan_index_forward we'd do Post.where(forum: 'General').order('post_count DESC') or even Post.where(forum: 'General').order(post_count: :desc).

Again apologies for incorrect documentation back then. Thoughts on the documentation and getting a method implemented to support the similar ActiveRecord interface?

richardhsu avatar Jul 09 '18 19:07 richardhsu

I like the proposed DSL Post.where(forum: 'General').order(post_count: :desc) and I am for removing existing public method scan_index_forward.

But the sorting isn't useful in DynamoDB because user has to specify partition/hash key. That's why actual code will be following:

Post.where(id: 'General').order(post_count: :desc)

Query without partition key of a table/index doesn't work. In this case I guess an exception should be raised. The name of sorted column can be used to find the index which will be used...

@pboling @richardhsu It's a breaking change so I would like to include it in the next major release in August. If nobody has objections I will work on it.

andrykonchin avatar Jul 10 '18 09:07 andrykonchin

@andrykonchin Yes agreed on proposal. However, since there can be secondary indexes, it isn't required that it be id which would be for the primary index. So long as the query has the hash key of one of the indices and the ordering is on the range key of that matched index. And yes an exception raised if no valid index is found is proper. Thanks!

richardhsu avatar Jul 10 '18 22:07 richardhsu

I've got

NoMethodError (undefined method 'order' for #<Dynamoid::Criteria::Chain:0x00007ff0de0a0fd8>)

Any suggestion?

NSLog0 avatar Sep 12 '18 05:09 NSLog0

@avatarr There is no order method still. Though you can control order of the result with scan_index_forward method.

andrykonchin avatar Sep 12 '18 08:09 andrykonchin

Hi, I have a problem trying to order the results of a dynamodb query. My table has a partition key (a string 'imei') and a sort_key(also a string 'gps_timestamp') When I execute this query: GpsPoint.where('imei.begins_with': "3", 'gps_timestamp.begins_with': Time.now.strftime("%Y-%m-%d")).scan_index_forward(true).record_limit(20) and this query: GpsPoint.where('imei.begins_with': "3", 'gps_timestamp.begins_with': Time.now.strftime("%Y-%m-%d")).scan_index_forward(false).record_limit(20) (changing scan_index_forward to false) I get the same 20 results, how is it so? (I always get the first 20 points of the day) To be clear, each day we receive hundreds of points, so it's not due to a lack of data in the DB. It seems like scan_index_forward has no effect, since the behavior of the query is the same with or without the call (I tried also not calling it looking for any difference in the results). Also, removing record_limit(20) has no effect on ordering either. Am I not understanding something or should I get different results?

engibengi avatar Aug 10 '22 10:08 engibengi

In this particular case the Scan operation will be used (as far as the primary key value isn't specified). scan_index_forward method takes any effect only in the Query operation and ignored in the Scan.

Probably the correct behavior would be to raise error when the Scan is chosen and scan_index_forward was used.

andrykonchin avatar Aug 11 '22 18:08 andrykonchin