PynamoDB icon indicating copy to clipboard operation
PynamoDB copied to clipboard

Question: Is their a way to run a query and then get the count?

Open shadycuz opened this issue 7 years ago • 3 comments

I have an operation I want to perform if I only get 1 result. I don't want to run a count and then get the item because I feel that would make more API calls. Instead I want to run a query like this...

results = Image.query(args['app'], (Image.latest==True) & (Image.branch == args['branch']))
print(len(results))

Results in...

print(len(results))
TypeError: object of type 'ResultIterator' has no len()

I took a peek in the code and it seems this has more to do with AWS.

class ResultIterator(object):
    """
    ResultIterator handles Query and Scan item pagination.
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html#Query.Pagination
    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.Pagination

Since you could possibly be returning 1000's of items it's not really possible to get the count before pulling all items down.

So it seems like I should make a list? add each item to the list? and then count the list to see if its one or not?

results = []
for item in results Image.query(args['app'], (Image.latest==True) & (Image.branch == args['branch'])):
    results.append(item)

if len(results) == 1:
   # Do this ....

shadycuz avatar Mar 29 '18 12:03 shadycuz

Your best option is to construct the query and try to advance twice. Performing a COUNT Query against DynamoDB could return 1 item which is then deleted before your query runs, throwing an exception.

Instead just advance the query twice. If it works once and fails on the second you have exactly one result (probably). If it works twice there's more than one result. In either case you don't need to create the full list to know if you have one result:

def one(some_query):
    result = next(some_query, None)
    extra = next(some_query, None)
    if extra is not None:  # too many
        return None
    # first or None
    return result


query = Image.query(
    args['app'],
    (Image.latest==True) & (Image.branch == args['branch'])
)

result = one(query)
if result is not None:  # found exactly one
    ...

In bloop this is exactly how Query.one() is implemented but it raises an exception instead of returning None (source)

numberoverzero avatar Apr 07 '18 23:04 numberoverzero

I found a pretty helpful method that seems to work provided you check the count after looping through the results.

items = Item.scan()
for item in items:
    print(item.to_json())
print(f'Matching records: {items.total_count}')

peoplespete avatar Sep 21 '22 19:09 peoplespete

I found a pretty helpful method that seems to work provided you check the count after looping through the results.

items = Item.scan()
for item in items:
    print(item.to_json())
print(f'Matching records: {items.total_count}')

While this works, be aware you may consume all your read units in the process. This is because you're loading all attributes over the wire, then throwing them away locally (even if you replace the print statement with pass - the cost is incurred before local deserialization).

DynamoDb provides the "COUNT" option for the Select parameter of Query and Scan, documented here and here. This will consume[0] at most 1 read unit. The select= argument to Table.query() (forwarded to Connection.query) can be set to "COUNT" to prevent excessive read consumption. However, the same argument isn't exposed for Table.scan or the underlying `Connection.scan. This parameter could be added to pynamodb's scan method, according to the AWS docs here (and I use it in bloop).

If you need to use a scan, at present you should use @peoplespete's sample above. If Table.scan exposes the select= attribute in the future, you should use that to save on bandwidth. If you can get away with a query for now instead of a scan, you should be able to use the following (untested):

some_query = Item.query(hash_key=something, ..., select="COUNT")
next(some_query)  # advance the iterator so the iterator makes a call to AWS
print(some_query.total_count)

(note that total_count is the number of items returned after applying the filter, not before.)


[0] This is not true if you use a filter statement. If you include a filter, your read unit consumption is proportional to the evaluated items, not the post-filter items. See here for details, specifically: "Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present." This is true even when using the COUNT option.

numberoverzero avatar Sep 21 '22 20:09 numberoverzero