dynamoid icon indicating copy to clipboard operation
dynamoid copied to clipboard

Finder: find_all_by_secondary_index does not return all the item values

Open mgonzaleza opened this issue 8 years ago • 3 comments

Hey guys,

So I have a table with around 40k records, regular finders cannot find my item:

Zip.where(:zipcode => '12155').first returns nil

I am however able to get my missing item via a global secondary index (minus most if its fields values):

Zip.find_all_by_secondary_index({:zipcode => '12155'}).first which returns:

#<Zip:0x007fef080eb1f0 @new_record=false, @attributes={:created_at=>nil, :updated_at=>nil, :object_id=>"85973ca6-c69e-45ab-b22e-00b3f2809f4c", :city=>nil, :decommisioned=>nil, :location=>nil, :location_type=>nil, :region=>nil, :state=>nil, :zipcode=>"12155", :zipcode_type=>nil, :lat=>nil, :lon=>nil}, @associations={}, @changed_attributes={}>

I was able to workaround it by doing:

Zip.find(Zip.find_all_by_secondary_index({:zipcode => '12155'}).first.object_id) which returns:

#<Zip:0x007fef080094a8 @new_record=false, @attributes={:created_at=>Fri, 10 Feb 2017 20:29:48 -0600, :updated_at=>Fri, 10 Feb 2017 20:29:48 -0600, :object_id=>"85973ca6-c69e-45ab-b22e-00b3f2809f4c", :city=>"SCHENEVUS", :decommisioned=>false, :location=>"NA-US-NY-SCHENEVUS", :location_type=>"PRIMARY", :region=>"3", :state=>"NY", :zipcode=>"12155", :zipcode_type=>nil, :lat=>"42.54", :lon=>"-74.82"}, @associations={}, @changed_attributes={}>

Any thoughts? Perhaps I am missing something? Thanks!

mgonzaleza avatar Feb 11 '17 03:02 mgonzaleza

I actually only use the secondary index for what I do with Dynamoid. I am not familiar with the Active Record-ish use cases. Perhaps another user can comment.

An example of my use case is:

find_all_by_secondary_index(
    {
        dynamo_primary_key_column_name => dynamo_primary_key_value
    }, # The signature of find_all_by_secondary_index is ugly, so must be an explicit hash here
    :range => {
        "#{range_column}.#{range_modifier}" => range_value
    },
    # false is the same as DESC in SQL (newest timestamp first)
    # true is the same as ASC in SQL (oldest timestamp first)
    :scan_index_forward => false # or true
)

where the range modifier is one of Dynamoid::Finders::RANGE_MAP.keys, where the RANGE_MAP is:

RANGE_MAP = {
  'gt'            => :range_greater_than,
  'lt'            => :range_less_than,
  'gte'           => :range_gte,
  'lte'           => :range_lte,
  'begins_with'   => :range_begins_with,
  'between'       => :range_between,
  'eq'            => :range_eq
}

Some range searches, like eq, will take a single value, and others, like between, will take an array with two values.

pboling avatar Feb 17 '17 15:02 pboling

From my use of Dynamoid so far, this is the case. It depends on how you want to look at the use case, because you are charged based on queries rather than storage. So if you look at it with the where type case with not all fields projected but you want to have all fields projected, then you're ALWAYS going to have 2 queries. 1st for the Global Secondary Index for the ID and 2nd on the actual primary table.

It almost seems to make more sense to just have all fields projected that way you have 1 query to the global secondary index in order to get what you want.

The only time (I believe) you really wouldn't want to do this is if you have partitioning scheme that goes beyond the allowed 10GB limit per partition key. (Though I think you may be able to work around it with the secondary sort keys, but I have yet to try and wasn't able to definitively say after reviewing the docs about these limits)

I can see the use case of wanting to use where instead of find_all_by_secondary_index but I do like the explicitness of it. Because with where the person writing it needs to be aware that if you write in an attribute with a secondary index then we're great (easy query) but if you don't, then the where becomes a painful scan which isn't going to have the same performance as SQL unless you have a small table. (Also we're limited on 5 global secondary indexes so being explicit feels better for sanity of understanding "under the hood" rather than "why is my where taking so long)

richardhsu avatar Jun 29 '17 17:06 richardhsu

@mgonzaleza you can have a global secondary index query return other attributes that are not part of the global secondary index. These are the macros I use:

    # Macro that invokes the dynamic parts of the setup
    def dynamo_primary_key_column(name)
      @dynamo_primary_key_column_name = name
      dynamic_field_and_validation
      define_global_secondary_index
    end

    def dynamic_field_and_validation
      field dynamo_primary_key_column_name, :string
      validates_presence_of dynamo_primary_key_column_name
    end

    def define_global_secondary_index
      global_secondary_index hash_key: dynamo_primary_key_column_name,
                             range_key: :observed_at,
                             projected_attributes: [:vendor_id, :service_id, :analyzed_at]
    end

That last bit - the projected_attributes is what makes the secondary index query return other attributes.

BUT, adding projected attributes to the model will not make them suddenly appear.

You have to re-index.

See: https://github.com/Dynamoid/Dynamoid/issues/145#issuecomment-312030212

pboling avatar Jun 29 '17 21:06 pboling