dynamoid icon indicating copy to clipboard operation
dynamoid copied to clipboard

Few items are not indexed into GSI

Open madhusudhan518 opened this issue 7 years ago • 3 comments

i have two Global Secondary Indexes. One GSI contains only hash key and another GSI one have both hash key and range key. both will back fill items but few items are not indexing in first GSI and few items are not indexing in another GSI. I thought it as index key violation. Then i tried to Detecting and Correcting Index Key Violations using Violation Detector based on http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.OnlineOps.ViolationDetection.html

but it didn't find any violated items and i got result as " 2017-07-13 12:58:33,141 INFO [com.amazonaws.services.dynamodbv2.online.index.PrintHelper] - Violation detection finished: Records scanned: 1252976, ### Violations found: 0, Violations deleted: 0, see results at: ./gsi_violation_check.csv "

is there any wayto find why the items are not indexed(backfillled) into global secondary indexes and add violated items into index ?

Thank you.

madhusudhan518 avatar Jul 13 '17 10:07 madhusudhan518

@madhusudhan518 What do you mean by backfilled? I am not familiar with that idea in the context of global secondary indexes on DynamoDB.

The only relevant thing I think may be related is that if you add fields to an index you have to create a new index, the old index won't get updated with the new field for old entries.

pboling avatar Jul 24 '17 12:07 pboling

Hi @pboling , Thank you for your reply. Backfilling means- For each item in the table, DynamoDB determines which set of attributes to write to the index based on its projection (KEYS_ONLY, INCLUDE, or ALL). It then writes these attributes to the index. During the backfill phase, DynamoDB keeps track of items that are being added, deleted, or updated in the table. The attributes from these items are also added, deleted, or updated in the index as appropriate.

I have created new index and it have all items are indexed well. While retrieving I am getting only few items. Unable to fetch all the items from index. EX: I have a table called students with attributes/fields name, school_name, standard Have a GSI with school_name as partition key. when i try to retrieve students based on school_name Syntax: Student.find_all_by_secondary_index({school_name: "primary School"}) it is not retrieving all the students in primary school. I scanned students table to get promary school students, i got more results than GSI result.

is There anything i missed ?

madhusudhan518 avatar Jul 25 '17 07:07 madhusudhan518

@madhusudhan518 How is your GSI defined?

I have a GSI defined like this:

global_secondary_index hash_key: dynamo_primary_key_column_name,
                             range_key: :observed_at,
                             projected_attributes: [:vendor_id, :service_id, :analyzed_at]

And it is working when queried like this:

range_modifier = 'between'
find_all_by_secondary_index(
          {
              dynamo_primary_key_column_name => dynamo_primary_key_value
          }, # The signature of find_all_by_secondary_index is ugly, so must be an explicit hash here
          :range => {
              "observed_at.between" => [1.week.ago.to_f, Time.current.to_f] # array of 2 timestamps converted to float
          },
          # false is the same as DESC in SQL (newest timestamp first)
          # true is the same as ASC in SQL (oldest timestamp first)
          :scan_index_forward => false
)

pboling avatar Jul 26 '17 05:07 pboling