wemake-python-styleguide
wemake-python-styleguide copied to clipboard
Forbid `await` in `for` loop
https://github.com/Instagram/Fixit/blob/master/fixit/rules/gather_sequential_await.py
What if you're using aiohttp or similar to fetch web pages? Any considerate person writing a bot will space their requests. How would you implement that without just waiting in the for loop? I suppose you could spin off to an actual thread and do it entirely synchronously, but I feel like that's a lot of setup just to avoid an await in a loop.
* Note, we're making the assumption that the code is also doing something other than fetch pages.
Hi @sobolevn I want to get involve in this project. It's my first time to open source project, hoping you could help me...
@shivang970 you would need to use visit_For somewhere in https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py
And ast.walk on this node to make sure that there are no await nodes.
You can have a look at this implementation, it is pretty similar: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py#L177-L179
Thanks a lot for your help!
@Sxderp there are better ways to do it. https://stackoverflow.com/questions/42231161/asyncio-gather-vs-asyncio-wait
I understand how gather works, but there are workflows in which we don't want to "start off" 100 connection requests. Using gather would cause everything to initialize in fairly quick succession. I can see something like gathering in increments of four, but you are still waiting in a loop.
Anyway, I think rate limiting is definitely an appropriate use.
I can get behind a Rule that forbids await in list comprehensions or a single line for loop. I think in most circumstances that would be a mistake. The rate limiting could potentially still apply, but if you need rate limiting you're generally doing other things with the data (more than one line).
Can you please provide a real-ish code sample with rate limiting in a loop?
Here is a stripped sample of something similar to what I've used. I can also think of examples in IPC in which state is encoded in the response and you need to send part of the response back in order to perform your next request. Something like a Redis "SCAN" operation. I have no actual implements doing that with asyncio, but just a thought.
import aiohttp
import asyncio
import random
from bs4 import BeautifulSoup
# Galleries would normally be fetched too. But for simplicity sake
# hardcode them.
GALLERIES = [
'https://example.com/gallery/1',
'https://example.com/gallery/2',
'https://example.com/gallery/3',
'https://example.com/gallery/4',
]
async def extract_image_srcs(session, gallery_url):
async with session.get(gallery_url) as gallery_response:
gallery_soup = BeautifulSoup(await gallery_response.text(), features='lxml')
return [img.attrs['src'] for img in gallery_soup.findAll('img')]
async def get_image_data(session, image_src):
async with session.get(image_url) as image_response:
return await image_response.read()
async fetch_gallery(gallery_url):
session = aiohttp.ClientSession()
image_sources = await extract_image_srcs(session, gallery_url)
for image_src in image_sources:
# This could technically be "gathered" but await is used to artifically
# reduced the number of potential requests. We can't guarantee any
# specific number, but it'll definitely be less than doing everything
# at once.
image_data = await get_image_data(session, image_src)
with open(str(random.randint(1, 255)), 'wb') as fd:
fd.write(image_data)
# Optionally include additional waiting if we want to be extra nice.
await asyncio.sleep(random.randint(1, 10))
await session.close()
async def bootstrap():
await asyncio.gather(*[fetch_gallery(gallery_url) for gallery_url in GALLERIES])
asyncio.run(bootstrap())
You should use asyncio.Semaphore if you need to ratelimit your request.
But asyncio.sleep(0) may be useful if you have async context but doing CPU-intensive task (crunching image data in script) to allow another coroutines advance.
If you're rate-limiting with more than one I can agree. If it's one I'm not sure I see the point.