PynamoDB set _first_iteration once, after first successful operation

Jul 04 '22 11:07 ilanjb

While handling some manual retries I encountered this issue. I was getting the below error since self._index was not set even though self._first_iteration was already False

  File "/pynamodb/pagination.py", line 209, in next
    return self.__next__()
  File "/pynamodb/pagination.py", line 197, in __next__
    while self._index == self._count:
AttributeError: 'ResultIterator' object has no attribute '_index'

Jul 04 '22 11:07 ilanjb

@garrettheel thanks for the feedback! ResultIterator._first_iteration would be set to False after the successful completion of the first query, right after

page = self._operation(*self._args, settings=self._settings, **self._kwargs

That guarantees that when the code gets to

while self._index == self._count:

self._index would be set

I was trying to add the ability to retry failed calls while iterating over the ResultsIterator. You can see the code I came up with below...

class PynamodbResultIteratorWithExponentialBackup:
    """Result iterators may encounter throttling errors when requesting the next page.
    Pynamodb allows setting the rate_limit, but does not allow custom control of how to handle exceptions
    This class takes the pynamodb.pagination.ResultIterator and adds a ExponentialBackup
    """

    def __init__(
        self,
        result_iterator: ResultIterator,
        max_num_retries_per_page=7,
        min_sleep_time_sec=1,
    ) -> None:
        self.result_iterator = result_iterator
        self.max_num_retries_per_page = max_num_retries_per_page
        self.min_sleep_time_sec = min_sleep_time_sec
        self.is_first_iteration = True  # see notes below

    def __iter__(self):
        return self

    def __next__(self):
        for i in range(self.max_num_retries_per_page):

            try:
                ans = self.result_iterator.next()
                self.is_first_iteration = False  # see note below
                return ans

            except QueryError as e:

                is_last_try = i == (self.max_num_retries_per_page - 1)
                if is_last_try:
                    logger.error("got throttling error while trying to get next page")
                    raise e

                if self.is_first_iteration:
                    # ResultIterator mistakenly sets self._first_iteration too ealy.
                    # this leads to attribute errors on retries.
                    # if this class has never had a succefull iteration...
                    # we reset the ResultIterator._first_iteration here.
                    self.result_iterator._first_iteration = (  # pylint: disable=W0212
                        True
                    )

                msg = f"encountered error {e} on last_evaluated_key on attempt {i + 1}"
                logger.info(msg)
                sleep_time = self.min_sleep_time_sec * randint(
                    1, 2**i
                )  # exponential back-off
                msg = f"sleeping for {sleep_time} seconds"
                logger.info(msg)
                sleep(sleep_time)

Aug 07 '22 16:08 ilanjb

Where is this subject up to ? Because I'm facing the same problem, but I just can't reproduce it and I'm a bit lost about how I can deal with it. If it can help, here is a simplify version of my code:

BACKOFF_BASE = 50
BACKOFF_CAP = 2000

class ExpoBackoffDecorr:
    """Used for retry in case of ThrottlingException"""
    def __init__(self):
        self.base = BACKOFF_BASE
        self.cap = BACKOFF_CAP
        self.sleep = self.base
        self.total_sleep = 0

    def backoff(self, n):
        self.sleep = min(self.cap, random.uniform(self.base, self.sleep * 3))
        self.total_sleep += self.sleep
        return self.sleep

    def backoff_query(expand=[], *args, **kwargs):
        objs = []

        algorithm = ExpoBackoffDecorr()

        try:
            for obj in MyObject.query(hash_key='test'):
                # Process my objects here and do some stuffs on them...
                objs.append(obj)
        except QueryError as exception:
            if exception.cause_response_code == 'ThrottlingException':
                if (algorithm.total_sleep / 1000) > 15:
                    raise Exception('Request didn\'t work. Try again later.')
                time.sleep(algorithm.backoff(0) / 1000)
            else:
                raise exception

To make this I followed this post : https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

Nov 03 '22 01:11 idromigny

@Yaronn44 not sure about changing the codebase, but I've been using the class I pasted in https://github.com/pynamodb/PynamoDB/pull/1059#issuecomment-1207445066 for a few months without problems.

Nov 03 '22 09:11 ilanjb

@ilanjb If I understand it well, you make your query, get your newly created ResultIterator, and give it to your class defined above ? Kind of hacky solution, but I think I'll go on this if it works well ! Thank you for the help

Nov 05 '22 00:11 idromigny

Thanks, implemented "in spirit" in #1101.

Nov 11 '22 04:11 ikonst