micropython-async icon indicating copy to clipboard operation
micropython-async copied to clipboard

'IndexError: pop from empty list' Exception in Queue (V3) due to uncleared _evput state

Open dbadrian opened this issue 5 years ago • 10 comments

This is using the temporary V3 Queue. Not sure if Im wasting my breath if there is already a new version somewhere else (at least I dont know of it...)? Anyway, I used this one and ran into a tad bit of an annoyance :)

Due to some "unfortunate" scheduling of routines, one can get greeted by a lovely "IndexError: pop from empty list" in the get() method of queue, where one would expect it to wait if the queue is empty - sadly doesn't always work...

I believe I found the reason (and fix) but first a minimal example and my explanation.

import uasyncio as asyncio
import Queue


async def putter(q, sleep_ms=50):
    # put some item, then sleep
    while True:
        await q.put(1)
        await asyncio.sleep_ms(sleep_ms)


async def getter(q):
   # checks for new items, and relies on the "blocking" of the get method
    while True:
        print(await q.get())


async def task():
    q = Queue()

    await asyncio.gather(
        putter(q),
        getter(q)
    )

asyncio.run(task())

The error appears only under some (scheduling) circumstances. One instance is:

  • if the putter routine is called first, meaning some item is placed in the queue, making it not empty and the _evput will be set (https://github.com/peterhinch/micropython-async/blob/master/v3/primitives/queue.py#L47).
  • if now the getter method is executed, it will return the item placed in queue as the 'if self.empty():' check (https://github.com/peterhinch/micropython-async/blob/master/v3/primitives/queue.py#L34) will evaluate to False.
  • if the getter methods now goes for the next iteration, BEFORE another item is placed (for example due to the putter method sleeping or some scheduling happenstance), the L34 check will now be True as the queue is empty, however, the L36 call 'await self._evput.wait()' will immediately return as the _evput is still set, since we never cleared it. Clearing only happens after an empty queue was awaited and a new item was put.

So, a proposal solution is:

    async def get(self):  #  Usage: item = await queue.get()
        if self.empty():
            # Queue is empty, put the calling Task on the waiting queue
            await self._evput.wait()
        self._evput.clear()
        return self._get()

Not sure if causes other problems, that I didnt consider due to 1.30 am tiredness, but ye. But I guessed, regardles of whether self.empty() was True or False, if the 'self._evput.clear()' is reached, some item was placed by now (and it will be set, so I guess we dont need any check about it...) I can also make a PR tomorrow after work if you agree that is a bug after all.

PS: In https://github.com/peterhinch/micropython-async/blob/master/v3/primitives/queue.py#L51 and https://github.com/peterhinch/micropython-async/blob/master/v3/primitives/queue.py#L59 it should be preferable to have

if self.maxsize and self.qsize() >= self.maxsize:

instead of the current

if self.qsize() >= self.maxsize and self.maxsize:

?

I believe Python evaluates boolean conditions lazily, meaning 'if self.maxsize' evaluates to false, we save ourself the second evaluation. Albeit we should probably check for 'if self.maxsize > 0', as we do not check or otherwise validate negative inputs. In the cpython/official version, negative numbers are treated like 0 that way and imply infinite size.

dbadrian avatar Aug 04 '20 23:08 dbadrian

Thank you for that well presented bug report.

As yet there are no official replacements for my primitives. I don't know when Damien expects to produce them, so I am keen to fix any problems. I am tied up today (Wednesday) and have only briefly examined this but I agree there is a problem.

Should _evput be cleared here?

    def _get(self):
        self._evget.set()
        self._evput.clear()
        return self._queue.pop(0)

    async def get(self):  #  Usage: item = await queue.get()
        if self.empty():
            # Queue is empty, put the calling Task on the waiting queue
            await self._evput.wait()
        return self._get()

This would ensure the event is cleared on any successful get including get_nowait.

I will look at this properly on Thursday; one thing to consider is whether there is an analogous issue on ._evget.

In the mean time by all means raise one or more PR's with any changes or optimisations you see fit.

peterhinch avatar Aug 05 '20 06:08 peterhinch

I'll think about it. My original placement was as yours, but for some reason I moved it. Probably your variation might be preferable. And I agree about ._evget, since its used in a pretty symmetric fashion, Ill see if it can break in some way.

I'll open the PR later and then we can iterate over it there. Thanks for the quick response and all the work you put in this library already!

dbadrian avatar Aug 05 '20 08:08 dbadrian

Got home later than expected, but putting up a PR in a minute or two. We can shift potential discussion there then, but for completeness some stuff here:

I did draft up a quick example, that breaks queue with the analogous issue on _evget. Although it won't cause an exception like the above, the queue will end up with more elements than what is allowed by maxsize.

async def putter(q):
    q.put_nowait(1)
    q.put_nowait(1)

    await asyncio.sleep_ms(20)
    while True:
        await q.put(1)
        print("l", q.qsize())

async def getter(q):
    await asyncio.sleep_ms(10)
    while True:
        print("ne", await q.get())
        await asyncio.sleep_ms(500)

async def task():
    q = QueueT(maxsize=2)

    await asyncio.gather(
        putter(q),
        getter(q),
    )

asyncio.run(task())

With regards to where to put the "fix". Something bugs me about putting 'self._evput.clear()' and 'self._evget.clear()' in the respective _get and _put methods. After all the _get and the _put method have (and should?) have nothing to do with it. After all, getting something is semantically unrelated to the put event. So, we don't really need to include the get_nowait in the clearing either See the PR for my solution, we can discuss there further

dbadrian avatar Aug 05 '20 20:08 dbadrian

Closing this for now, assuming it is properly fixed now :)

dbadrian avatar Aug 06 '20 10:08 dbadrian

I've also pushed your optimisation of put_nowait().

peterhinch avatar Aug 06 '20 10:08 peterhinch

Ah, cheers. I didn't have time for that PR yet. However, I guess the same change should be made in https://github.com/peterhinch/micropython-async/blob/c7f07e1925bd6495e193cfa7b4a3ff38565359de/v3/primitives/queue.py#L51 as well?

Nevertheless, I'd suggest "bigger" change. Currently, this is does not deal properly with negative maxsizes, which in the regular asyncio (afaik) are treated as infinite as well. Negatives numbers will return True in this statement if I am not mistaken.

We already have this function self.full(), which does same the check, except it does also account correctly for negative inputs. Why don't we just replace both lines with a call to self.full() (I reckon for compiled code, the compiler will optimize this properly)? While we are at it, we can also rewrite it a bit shorter with ternary syntax - but overall the difference in instructions is negligible, see comparison https://godbolt.org/z/YrYKPa

def full(self):  # Return True if there are maxsize items in the queue.
    # Note: if the Queue was initialized with maxsize=0 (the default),
    # then full() is never True.
    return self.maxsize > 0 and self.qsize() >= self.maxsize

dbadrian avatar Aug 06 '20 11:08 dbadrian

@kevinkk525 has demonstrated that this problem is not fully fixed by PR https://github.com/peterhinch/micropython-async/pull/42. It still fails if multiple coros compete to get from a Queue which becomes empty. A similar problem occurs on put, where a Queue could grow beyond its specified size.

I have pushed an update which fixes this. I've also enhanced primitives/tests/asyntest.py to test for these cases including the one in the OP.

peterhinch avatar Aug 30 '20 17:08 peterhinch

What is currently left to do here? You mentioned a fix was provided by you, but I see the issues is reopened for now?

dbadrian avatar Sep 30 '20 13:09 dbadrian

I reopened it to draw your attention to the fact that I'd modified this further. As far as I'm concerned this is now fixed. If you're happy we can close this.

peterhinch avatar Oct 01 '20 06:10 peterhinch

Understood, thanks. I hope to get back to my project soon and then I'll give it a review/testing and close it after.

dbadrian avatar Oct 03 '20 16:10 dbadrian

Closing as I believe this is fixed.

peterhinch avatar Nov 25 '22 10:11 peterhinch