dynamoid
dynamoid copied to clipboard
Finder: find_all_by_secondary_index does not return all the item values
Hey guys,
So I have a table with around 40k records, regular finders cannot find my item:
Zip.where(:zipcode => '12155').first
returns nil
I am however able to get my missing item via a global secondary index (minus most if its fields values):
Zip.find_all_by_secondary_index({:zipcode => '12155'}).first
which returns:
#<Zip:0x007fef080eb1f0 @new_record=false, @attributes={:created_at=>nil, :updated_at=>nil, :object_id=>"85973ca6-c69e-45ab-b22e-00b3f2809f4c", :city=>nil, :decommisioned=>nil, :location=>nil, :location_type=>nil, :region=>nil, :state=>nil, :zipcode=>"12155", :zipcode_type=>nil, :lat=>nil, :lon=>nil}, @associations={}, @changed_attributes={}>
I was able to workaround it by doing:
Zip.find(Zip.find_all_by_secondary_index({:zipcode => '12155'}).first.object_id)
which returns:
#<Zip:0x007fef080094a8 @new_record=false, @attributes={:created_at=>Fri, 10 Feb 2017 20:29:48 -0600, :updated_at=>Fri, 10 Feb 2017 20:29:48 -0600, :object_id=>"85973ca6-c69e-45ab-b22e-00b3f2809f4c", :city=>"SCHENEVUS", :decommisioned=>false, :location=>"NA-US-NY-SCHENEVUS", :location_type=>"PRIMARY", :region=>"3", :state=>"NY", :zipcode=>"12155", :zipcode_type=>nil, :lat=>"42.54", :lon=>"-74.82"}, @associations={}, @changed_attributes={}>
Any thoughts? Perhaps I am missing something? Thanks!
I actually only use the secondary index for what I do with Dynamoid. I am not familiar with the Active Record-ish use cases. Perhaps another user can comment.
An example of my use case is:
find_all_by_secondary_index(
{
dynamo_primary_key_column_name => dynamo_primary_key_value
}, # The signature of find_all_by_secondary_index is ugly, so must be an explicit hash here
:range => {
"#{range_column}.#{range_modifier}" => range_value
},
# false is the same as DESC in SQL (newest timestamp first)
# true is the same as ASC in SQL (oldest timestamp first)
:scan_index_forward => false # or true
)
where the range modifier is one of Dynamoid::Finders::RANGE_MAP.keys, where the RANGE_MAP is:
RANGE_MAP = {
'gt' => :range_greater_than,
'lt' => :range_less_than,
'gte' => :range_gte,
'lte' => :range_lte,
'begins_with' => :range_begins_with,
'between' => :range_between,
'eq' => :range_eq
}
Some range searches, like eq, will take a single value, and others, like between, will take an array with two values.
From my use of Dynamoid so far, this is the case. It depends on how you want to look at the use case, because you are charged based on queries rather than storage. So if you look at it with the where
type case with not all fields projected but you want to have all fields projected, then you're ALWAYS going to have 2 queries. 1st for the Global Secondary Index for the ID and 2nd on the actual primary table.
It almost seems to make more sense to just have all fields projected that way you have 1 query to the global secondary index in order to get what you want.
The only time (I believe) you really wouldn't want to do this is if you have partitioning scheme that goes beyond the allowed 10GB limit per partition key. (Though I think you may be able to work around it with the secondary sort keys, but I have yet to try and wasn't able to definitively say after reviewing the docs about these limits)
I can see the use case of wanting to use where
instead of find_all_by_secondary_index
but I do like the explicitness of it. Because with where
the person writing it needs to be aware that if you write in an attribute with a secondary index then we're great (easy query) but if you don't, then the where
becomes a painful scan which isn't going to have the same performance as SQL unless you have a small table. (Also we're limited on 5 global secondary indexes so being explicit feels better for sanity of understanding "under the hood" rather than "why is my where taking so long)
@mgonzaleza you can have a global secondary index query return other attributes that are not part of the global secondary index. These are the macros I use:
# Macro that invokes the dynamic parts of the setup
def dynamo_primary_key_column(name)
@dynamo_primary_key_column_name = name
dynamic_field_and_validation
define_global_secondary_index
end
def dynamic_field_and_validation
field dynamo_primary_key_column_name, :string
validates_presence_of dynamo_primary_key_column_name
end
def define_global_secondary_index
global_secondary_index hash_key: dynamo_primary_key_column_name,
range_key: :observed_at,
projected_attributes: [:vendor_id, :service_id, :analyzed_at]
end
That last bit - the projected_attributes
is what makes the secondary index query return other attributes.
BUT, adding projected attributes to the model will not make them suddenly appear.
You have to re-index.
See: https://github.com/Dynamoid/Dynamoid/issues/145#issuecomment-312030212