micropython-lib icon indicating copy to clipboard operation
micropython-lib copied to clipboard

uasyncio: KeyError in `remove_reader` (MacOS)

Open belyalov opened this issue 8 years ago • 17 comments

uasyncio version 1.2.4 I'm using it in a pretty common way:

        loop = asyncio.get_event_loop()
        print("* Starting Web Server at {}:{}".format(host, port))
        loop.create_task(asyncio.start_server(self._handler, host, port, backlog=backlog))
        if loop_forever:
            loop.run_forever()
            loop.close()

Sometimes (probably when connection is closed from client side) I see exception:

Traceback (most recent call last):
  File "./esp_restapi.py", line 53, in <module>
  File "./esp_restapi.py", line 49, in run
  File "/Users/belyalov/.micropython/lib/tinyweb/server.py", line 320, in run
  File "/Users/belyalov/.micropython/lib/uasyncio/core.py", line 136, in run_forever
  File "/Users/belyalov/.micropython/lib/uasyncio/core.py", line 113, in run_forever
  File "/Users/belyalov/.micropython/lib/uasyncio/__init__.py", line 39, in remove_reader
KeyError: 4339811008

belyalov avatar Jan 05 '18 15:01 belyalov

The backtrace you post don't seem to correlate with the code snippet you posted. You would need to review what callers of uasyncio do which causes such behavior. Alternatively, feel free to provide simple and self-contained example which reproduces this problem (directly against uasyncio or against other uasyncio-based reference software, like picoweb).

pfalcon avatar Jan 05 '18 15:01 pfalcon

Provided snippet is just for example of usage. However, original code is close to snippet. Ok, I'll try to reproduce the problem in a pretty small program and let you know.

belyalov avatar Jan 06 '18 02:01 belyalov

So I've got stable repro of this problem. Here is pretty simple example:

#!/usr/bin/env micropython

import uasyncio as asyncio


def _handler(reader, writer):
    print('New connection')
    line = yield from reader.readline()
    print(line)
    yield from writer.awrite('Gotcha!')
    yield from writer.aclose()


def run(host="127.0.0.1", port=8081, loop_forever=True, backlog=16):
    loop = asyncio.get_event_loop()
    print("* Starting Server at {}:{}".format(host, port))
    loop.create_task(asyncio.start_server(_handler, host, port, backlog=backlog))
    if loop_forever:
        loop.run_forever()
        loop.close()


if __name__ == '__main__':
    run()

This program is simply listen for TCP connection and then reads line. So to reproduce the problem:

  1. Run attached snippet
  2. Open TCP connection e.g. by nc to localhost port 8081 and then close it without sending data (Ctrl+C)
$ nc -4 localhost 8081
^C
  1. Check output of snippet (something like that):
./z.py
* Starting Server at 127.0.0.1:8081
New connection
Traceback (most recent call last):
  File "./z.py", line 24, in <module>
  File "./z.py", line 19, in run
  File "/Users/belyalov/.micropython/lib/uasyncio/core.py", line 136, in run_forever
  File "/Users/belyalov/.micropython/lib/uasyncio/core.py", line 113, in run_forever
  File "/Users/belyalov/.micropython/lib/uasyncio/__init__.py", line 39, in remove_reader
KeyError: 4399426432

uasyncio is from commit d19253a which is last working version before breaking changes mentioned in #244

belyalov avatar Jan 06 '18 02:01 belyalov

Forgot to mention, I'm using micropython for Mac (unix?).

belyalov avatar Jan 06 '18 06:01 belyalov

More investigations: This is because reader actually gets deleted twice:

  • When socket error happened: https://github.com/micropython/micropython-lib/blob/c59c5c6ef8c6940a45a9e25cd26decbfe217430a/uasyncio/uasyncio/init.py#L76-L82
  • And one more time when IOReadDone: https://github.com/micropython/micropython-lib/blob/c59c5c6ef8c6940a45a9e25cd26decbfe217430a/uasyncio.core/uasyncio/core.py#L114-L115

So fix could be pretty simple like:

if id(sock) not in self.objmap:
    return

belyalov avatar Jan 06 '18 14:01 belyalov

Thanks for preparing a testcase. But I can't reproduce the behavior you described, instead I get after pressing ^C on running nc:

$ micropython test-remove_reader-crash.py 
* Starting Server at 127.0.0.1:8081
New connection
b''

The script works pretty as expected. There definitely should be b'' printed, because breaking nc closes its side of the connection, and "peer closed connection" condition on a socket is recognized as an EOF (i.e. empty string).

So, the first question you'd need to answer is why you don't get an EOF.

What's a bit worrying is that writing to closed socket should lead to an error, and it doesn't happen. But "should lead" actually applies only to blocking socket and under some additional conditions. It's all more complicated with non-blocking sockets. Well, perhaps that matter worth being investigated (not directly related to this ticket).

pfalcon avatar Jan 06 '18 20:01 pfalcon

This is because reader actually gets deleted twice: When socket error happened: And one more time when IOReadDone:

Yeah, good reasoning. Now you just need to explain how IOReadDone gets yielded. Nothing in the code you posted directly does that (e.g. writer.aclose() yields IOWriteDone). readline() may yield IOReadDone, but then why you don't get b''?

So, please keep investigating. You may need to go as deep as printing the values poll() returns on your system. I'm using Linux and the latest uasyncio for reference (and I recommend using the latter too).

pfalcon avatar Jan 06 '18 20:01 pfalcon

You may need to go as deep as printing the values poll() returns on your system.

If you do that, please don't forget to post patch with print()s, I'll need to compare it with the values I get on Linux.

pfalcon avatar Jan 06 '18 20:01 pfalcon

how IOReadDone gets yielded

Whenever you yield reader.aclose() The different question is should you call it? If yes - how to know if reader is still alive?

P.S. Test case is slightly wrong - looks like I've attached first version of it and forgot to update. So to get it reproduced simply replace writer.close() with reader.close()

belyalov avatar Jan 07 '18 02:01 belyalov

Hello, I can confirm this problem does exist. The culprit is in the following: https://github.com/micropython/micropython-lib/blob/f20d89c6aad9443a696561ca2a01f7ef0c8fb302/uasyncio/uasyncio/init.py#L100-L111

If, for some reasons, a call to ios.read() doesn't return anything(this happens to me on a low memory situation), then you effectively have two calls to IOReadDone() -- one in .read(), the other in .aclose(). The second call will cause a key fault.

An obvious fix is already in @belyalov's reply.

shawwwn avatar Sep 26 '18 08:09 shawwwn

a call to ios.read() doesn't return anything

In Python, a function always returns something.

then you effectively have two calls to IOReadDone() -- one in .read(), the other in .aclose().

Ok, that seems plausible.

An obvious fix is already in @belyalov's reply.

That's obviously not consistent with how it's implemented for remove_writer().

pfalcon avatar Sep 26 '18 10:09 pfalcon

In Python, a function always returns something.

Did you understand my post or did you have to go word-to-word?

I said this happened in "a low memory situation." Socket read returns None because system is out-of-memory. Under such condition, there will be two calls to IOReadDone() in which the latter will cause a key fault. @belyalov's solution is the quickest and it worked. If you happen to have your own implementation in mind, why don't you fix it?

Next time, maybe try to pay attention to the actual code instead of people's words.

shawwwn avatar Sep 26 '18 12:09 shawwwn

Did you understand my post or did you have to go word-to-word?

There's such notion as "fail fast": https://en.wikipedia.org/wiki/Fail-fast . If you make mistake with such simple things as Python function returns, just imagine how many mistakes can be done with less obvious things?

Socket read returns None because system is out-of-memory. Under such condition, there will be two calls to IOReadDone() in which the latter will cause a key fault.

Still not a correct analysis, see the code above.

@belyalov's solution is the quickest and it worked.

As a maintainer of many (too many!) projects, I'm not interested in random quick-and-dirty "fixes", but in understanding the problem and finding the best solution, which is sometimes not obvious. I welcome anyone to apply the same approach and put the uasyncio code under scrutiny (but then the entire code, not just 2 random lines, it's a few screenfuls anyway). And if the description of a problem starts with "foo() doesn't return anything", please kindly let me be skeptical and call for more thoroughness on it.

pfalcon avatar Sep 26 '18 13:09 pfalcon

Please don't introduce irrelevant topic into this discussion. Thank you!

Conclusion: The current uasyncio implementation is uncared of socket error and (judging from the conversation) the issue is not likely to get fixed. So if you are reading this, please be prepared to write up your own select/poll mechanism when dealing with socket.

shawwwn avatar Sep 26 '18 14:09 shawwwn

@shawwwn I'd suggest you not to expect any patches accepted for micropython-lib. Here is good reading of what's going on.

Personally I decided not to spend any more time to talk to "maintainer" - simply because it is just time spending instead of problem solving.

Fork it (or use existing one), apply patch and use it like I did.

P.S. here is the fix: https://github.com/belyalov/micropython-lib/commit/68abd69b726448da06a682e1ec1216fd7575dce1

belyalov avatar Sep 26 '18 19:09 belyalov

All fixes and further development happens in the upstream repo, https://github.com/pfalcon/micropython-lib . This issue will be fixed too, when someone prepares a proper patch for it.

pfalcon avatar Sep 27 '18 09:09 pfalcon

This remains the upstream library for the MicroPython project.

That said, is this still an issue on recent versions on MicroPython and asyncio versions?

jonnor avatar Aug 25 '24 12:08 jonnor