wemake-python-styleguide icon indicating copy to clipboard operation
wemake-python-styleguide copied to clipboard

Forbid `await` in `for` loop

Open sobolevn opened this issue 5 years ago • 10 comments
trafficstars

https://github.com/Instagram/Fixit/blob/master/fixit/rules/gather_sequential_await.py

sobolevn avatar Sep 13 '20 22:09 sobolevn

What if you're using aiohttp or similar to fetch web pages? Any considerate person writing a bot will space their requests. How would you implement that without just waiting in the for loop? I suppose you could spin off to an actual thread and do it entirely synchronously, but I feel like that's a lot of setup just to avoid an await in a loop.

* Note, we're making the assumption that the code is also doing something other than fetch pages.

Sxderp avatar Sep 16 '20 15:09 Sxderp

Hi @sobolevn I want to get involve in this project. It's my first time to open source project, hoping you could help me...

shivang970 avatar Sep 23 '20 13:09 shivang970

@shivang970 you would need to use visit_For somewhere in https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py

And ast.walk on this node to make sure that there are no await nodes. You can have a look at this implementation, it is pretty similar: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py#L177-L179

Thanks a lot for your help!

sobolevn avatar Sep 23 '20 14:09 sobolevn

@Sxderp there are better ways to do it. https://stackoverflow.com/questions/42231161/asyncio-gather-vs-asyncio-wait

sobolevn avatar Sep 23 '20 14:09 sobolevn

I understand how gather works, but there are workflows in which we don't want to "start off" 100 connection requests. Using gather would cause everything to initialize in fairly quick succession. I can see something like gathering in increments of four, but you are still waiting in a loop.

Anyway, I think rate limiting is definitely an appropriate use.

Sxderp avatar Sep 23 '20 15:09 Sxderp

I can get behind a Rule that forbids await in list comprehensions or a single line for loop. I think in most circumstances that would be a mistake. The rate limiting could potentially still apply, but if you need rate limiting you're generally doing other things with the data (more than one line).

Sxderp avatar Sep 23 '20 15:09 Sxderp

Can you please provide a real-ish code sample with rate limiting in a loop?

sobolevn avatar Sep 23 '20 15:09 sobolevn

Here is a stripped sample of something similar to what I've used. I can also think of examples in IPC in which state is encoded in the response and you need to send part of the response back in order to perform your next request. Something like a Redis "SCAN" operation. I have no actual implements doing that with asyncio, but just a thought.

import aiohttp
import asyncio
import random
from bs4 import BeautifulSoup

# Galleries would normally be fetched too. But for simplicity sake
# hardcode them.
GALLERIES = [
    'https://example.com/gallery/1',
    'https://example.com/gallery/2',
    'https://example.com/gallery/3',
    'https://example.com/gallery/4',
]

async def extract_image_srcs(session, gallery_url):
    async with session.get(gallery_url) as gallery_response:
        gallery_soup = BeautifulSoup(await gallery_response.text(), features='lxml')
        return [img.attrs['src'] for img in gallery_soup.findAll('img')]

async def get_image_data(session, image_src):
    async with session.get(image_url) as image_response:
        return await image_response.read()

async fetch_gallery(gallery_url):
    session = aiohttp.ClientSession()
    image_sources = await extract_image_srcs(session, gallery_url)
    for image_src in image_sources:
        # This could technically be "gathered" but await is used to artifically
        # reduced the number of potential requests. We can't guarantee any
        # specific number, but it'll definitely be less than doing everything
        # at once.
        image_data = await get_image_data(session, image_src)
        with open(str(random.randint(1, 255)), 'wb') as fd:
            fd.write(image_data)
        # Optionally include additional waiting if we want to be extra nice.
        await asyncio.sleep(random.randint(1, 10))
    await session.close()

async def bootstrap():
    await asyncio.gather(*[fetch_gallery(gallery_url) for gallery_url in GALLERIES])

asyncio.run(bootstrap())

Sxderp avatar Sep 23 '20 16:09 Sxderp

You should use asyncio.Semaphore if you need to ratelimit your request. But asyncio.sleep(0) may be useful if you have async context but doing CPU-intensive task (crunching image data in script) to allow another coroutines advance.

strmwalker avatar Aug 02 '22 17:08 strmwalker

If you're rate-limiting with more than one I can agree. If it's one I'm not sure I see the point.

Sxderp avatar Aug 03 '22 10:08 Sxderp