pyasic
pyasic copied to clipboard
Network scan of some subnets time out on Ubuntu 22.04.2
Describe the bug
When making a network from a subnet and then calling network.scan()
on certain subnets, it will timeout and fail based on a singular ip being present in the subnet. When that ip is removed from the subnet, (by using MinerNetwork.from_list()
and supplying all of the ips except the specific one) it works fine. The thing is that on macOS 13.6.6 it works fine even with that ip included.
I've traced the issue of the bug to the _get_miner_web()
call, and more specifically the _web_ping()
call with the url f"http://{ip}/"
. When examining the logs for the from the httpx.AsyncClient.get()
call, the
[DEBUG][04/26/24 14:06:01](httpcore.connection) - connect_tcp.started host=<IP> port=80 local_address=None timeout=5.0 socket_options=[(1, 13, b'\x01\x00\x00\x00\xe8\x03\x00\x00')]
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.connection) - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x104cf6480>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_headers.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_headers.complete
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_body.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - send_request_body.complete
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:01](httpcore.http11) - receive_response_headers.started request=<Request [b'GET']>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - receive_response_headers.failed exception=ReadTimeout(TimeoutError())
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - response_closed.started
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/httpcore/_trace.py:85 in atrace
[DEBUG][04/26/24 14:06:06](httpcore.http11) - response_closed.complete
It starts hanging on the response_closed.started
line for around 100 seconds, before failing. On mac, this get request response closes without timeout of the scan.
The only difference I see is that on mac, the same logs show socket_options=[(65535, 128, b'\x01\x00\x00\x00\xe8\x03\x00\x00')]
Expected behavior The http get request should fail without disrupting the scan.
Desktop (please complete the following information):
- OS: macOS 13.6.6 (working)
- OS: Ubuntu 22.04.2 LTS (not working)
This is a very detailed report, thank you. This will likely take some digging, @jameshilliard any initial thought?
@martyfay1 would you mind sharing what that IP is?
I've made some changes on the dev_scan_fail
branch, do you want to see if that fixes anything?
I've tried with that branch and it still fails. I'll get back to you on what that IP is.
This has happened on 2 subnets. In both cases, the IPs correspond to a miner in a bad state. Nothing on it is responsive. The miners are S19J Pro running bitmain firmware Mon Dec 26 17:19:30 CST 2022.
Ok interesting. That makes a lot more sense, I had thought maybe somehow the IP address itself was killing the request, but if it's an unresponsive miner that's a lot different.
It's likely just not handling a read timeout properly.
Do you have any suggestion for how I can mitigate this from delaying the whole scan when done on Linux? I don't want potentially one bad miner to harm scan performance...
I don't have any ideas yet, we're testing some things, I'm sure it can be fixed here in the library, but I just have to figure out what is actually freezing for that long and why.
Seems like its hanging up somewhere in this function in httpcore -
async def _response_closed(self) -> None:
async with self._state_lock:
if (
self._h11_state.our_state is h11.DONE
and self._h11_state.their_state is h11.DONE
):
self._state = HTTPConnectionState.IDLE
self._h11_state.start_next_cycle()
if self._keepalive_expiry is not None:
now = time.monotonic()
self._expire_at = now + self._keepalive_expiry
else:
await self.aclose()
Any change you can run a debug python handler and see if you can find where it gets stuck?
Github Permalink - https://github.com/encode/httpcore/blob/39b756f07d40fab4eccb7c0dd3d7d369f1d2b4e8/httpcore/_async/http11.py#L245-L257
I agree that it is getting stuck at that function. I am having a hard time figuring out how to debug it since it is on a server. I ran it with the python -m trace --trace scan.py
and I think it is getting stuck on the last line there. Sorry if that's not much help. If you have any links to python debugging in the command line on a server lmk
Maybe something like this?
https://stackoverflow.com/questions/543196/how-do-i-attach-a-remote-debugger-to-a-python-process
Hey so I spent a little more time working with the pdb. It hangs after the await sleep(0) call at site-packages/anyio/_backends/_asyncio.py(1192), because Executing <Handle _SelectorSocketTransport._call_connection_lost(None) created at /usr/lib/python3.12/asyncio/selector_events.py:875> took 94.301 seconds
. That's a call to this, which calls this. I am suspicious of the call to self._sock.close()
as I've seen that the code was cancelled at here before, after I'd cancelled it when it was hanging.
Can you try this at the start of the program? Curious if killing the connections with SO_LINGER
will help.
pyasic.settings.update("socket_linger_time", 0)
Set the linger to 0, same problem still.
I'm stumped on this one. You might just want to play around with the _web_ping
method and see if you can get it to stop happening?
Added an asyncio.wait_for()
in the web ping method, can you see if the latest version of the dev_scan_fail
branch fixes the issue?
Still the scan takes ~100 seconds longer then it should. I've tried many things, like setting timeouts on the get request and what seems to happen is that the get request knows it needs to timeout after a certain time, say 5 seconds, and starts to close the request then, but the actual closing of the socket takes ~100 seconds. When trying to call a get request from the command line ie. curl -m 10 -v <ip>
the timeout works great, and outputs:
curl -m 10 -v <ip>
* Trying <ip>...
* Connected to <ip> (<ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <ip>
> User-Agent: curl/7.81.0
> Accept: */*
>
* Operation timed out after 10001 milliseconds with 0 bytes received
* Closing connection 0
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
If not including the -m 10 arguments, it will also take ~100 seconds to time out when the connection is reset by peer. Not sure if this helps at all but I couldn't figure it out from here.
curl -m 10 -v <ip>
* Trying <ip>...
* Connected to <ip> (<ip>) port 80 (#0)
> GET / HTTP/1.1
> Host: <ip>
> User-Agent: curl/7.81.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection 0
curl: (56) Recv failure: Connection reset by peer
The last thing I can think of here before I try to take this to httpx is that it's possibly an issue with an older version of python? Have you tried on the mac with the same python version you have on the Ubuntu device?
I just made sure, version 3.12.3 on the mac, and I upgraded from 3.12.2 to 3.12.3 on ubuntu and still experiencing the same issue.
Started a discussion in httpx, we shall see.
https://github.com/encode/httpx/discussions/3202
I had a similar problem. I solved it by editing the config. https://github.com/encode/httpx/blob/master/httpx/_config.py#L370 DEFAULT_TIMEOUT_CONFIG DEFAULT_LIMITS DEFAULT_MAX_REDIRECTS
the bug is observed before the version 0.40.5
I am using the nmap scanner bypassing the built-in scanner