aiocurl icon indicating copy to clipboard operation
aiocurl copied to clipboard

100% CPU during waiting for first HTTP response

Open panicfarm opened this issue 4 years ago • 5 comments

Hi. When I initialize aiocurl and make a first request, the CPU as shown on top is 100% and the process' status is R (running). This does not happen during the subsequent requests, if I reuse the open cURL handle. To reproduce this, please run the script below: it makes a request that is guaranteed to stall for 10 seconds. If you observe top in another terminal for this python process, you will see that CPU is 100%,, while the program should be idle and waiting on the socket for 10 seconds.

import asyncio                                            
import aiocurl                                            
                                                          
async def perform():                                      
    c = aiocurl.Curl()                                    
    c.setopt(aiocurl.URL, 'http://httpbin.org/delay/10')  
    print(f"first request, CPU 100%")                     
    await c.perform()                                     
    print(f"second request, CPU <1%")                     
    await c.perform()                                     
    c.close()                                             
    c = aiocurl.Curl()                                    
    c.setopt(aiocurl.URL, 'http://httpbin.org/delay/10')  
    print(f"Reinited cURL, first request, CPU 100%")      
    await c.perform()                                     
                                                          
                                                          
asyncio.run(perform())

panicfarm avatar Mar 24 '22 16:03 panicfarm

I used this script, also successfully reproduced, on cygwin on windows 10. Strangely though, if the proxy server is set, the cpu usage is normal. c.setopt(aiocurl.PROXY, 'socks5h://127.0.0.1:1080')

GeekDuanLian avatar Mar 25 '22 10:03 GeekDuanLian

profile Here is my test. Looks like curl is performing a lot of callback without reason.

synodriver avatar Jul 09 '22 13:07 synodriver

I found out that this is not the case with tornado's curl: https://www.tornadoweb.org/en/stable/httpclient.html

import asyncio
from tornado.curl_httpclient import CurlAsyncHTTPClient
async def do(): return await CurlAsyncHTTPClient().fetch('http://httpbin.org/delay/10')
asyncio.run(do())

The author doesn't seem to care about this project, I plan to implement one by myself according to the source code of tornado.

The reason why this project is full of CPU is this line: https://github.com/fsbs/aiocurl/blob/5818e98d79cc2b8f3ccd0aa3c8a0b64fc353d6a4/aiocurl.py#L97-L98 The stream is always writable, because no large data is written causing blocking, so the loop keeps calling the callback all the time.

The temporary solution is to cancel the listening after a callback, but I don't know if there are any consequences.

 if ev_bitmask & POLL_OUT: 
     loop.add_writer(sock_fd, lambda: (self._socket_action(sock_fd, CSELECT_OUT), loop.remove_writer(sock_fd)))

GeekDuanLian avatar Nov 17 '22 03:11 GeekDuanLian

I found the real cause, it's a known issue and tornado already has a fix. https://github.com/tornadoweb/tornado/blob/master/tornado/curl_httpclient.py#L121-L130

As a fix for this library:

--- aiocurl.py
+++ aiocurl-fix.py
@@ -82,6 +82,9 @@
         # pycurl.Curl handles mapped to aiocurl.Curl handles
         self._handles = {}

+        # fix #1
+        self._fds = set()
+
     def setopt(self, option, value):
         if option in (M_SOCKETFUNCTION, M_TIMERFUNCTION):
             raise error('callback option reserved for the event loop')
@@ -91,15 +94,20 @@
         "libcurl socket callback: add/remove actions for socket events."
         loop = _asyncio.get_running_loop()

+        if sock_fd in self._fds:
+            loop.remove_reader(sock_fd)
+            loop.remove_writer(sock_fd)
+
         if ev_bitmask & POLL_IN:
             loop.add_reader(sock_fd, self._socket_action, sock_fd, CSELECT_IN)
+            self._fds.add(sock_fd)

         if ev_bitmask & POLL_OUT:
             loop.add_writer(sock_fd, self._socket_action, sock_fd, CSELECT_OUT)
+            self._fds.add(sock_fd)

         if ev_bitmask & POLL_REMOVE:
-            loop.remove_reader(sock_fd)
-            loop.remove_writer(sock_fd)
+            self._fds.remove(sock_fd)

     def _timer_callback(self, timeout_ms):
         "libcurl timer callback: schedule/cancel a timeout action."

There are more fixes for pycurl in the tornado library, so I just wrote it according to the source code of that library.

GeekDuanLian avatar Nov 18 '22 06:11 GeekDuanLian

Thank you @GeekDuanLian

bratao avatar Dec 14 '22 18:12 bratao