requests-html icon indicating copy to clipboard operation
requests-html copied to clipboard

Help me understanding the return order of asession.run

Open z3ch5 opened this issue 3 years ago • 1 comments

from requests_html import AsyncHTMLSession import functools

async def get_link(link): r = await asession.get(link) f = str(r) + link return f

asession = AsyncHTMLSession()

links = [ 'https://google.com', 'https://yahoo.com', 'https://python.org' ]

links = [ functools.partial(get_link, link) for link in links ]

print(links)

results = asession.run(*links)

print(results)

What I get is : [functools.partial(<function get_link at 0x7fa59c438040>, 'https://google.com'), functools.partial(<function get_link at 0x7fa59c438040>, 'https://yahoo.com'), functools.partial(<function get_link at 0x7fa59c438040>, 'https://python.org')] ['<Response [200]>https://python.org', '<Response [200]>https://google.com', '<Response [200]>https://yahoo.com']

So why did the list of asession.run return in wrong order? is there a way to get the result in the same order they being send?

z3ch5 avatar Sep 10 '21 11:09 z3ch5

According to line 838 of requests_html.py indeed the return order should match the call order. I don't think the way it is written that is possible. The whole point of acyncio is that one longer operation should not block a faster one. The order of the return depends on which site replies first.

I was using an example to debug a problem of my own, where only ever one call would work, the 2nd would always hang forever. Turns out a problem with an old install of Python 3.8 in Windows. Deleted it all and installed it all again fixed it.

Anyway, for completeness, I took your code and updated it to work round the issue you point out. Though I think this is a bug as the code should behave as it's comments say.

import functools
from requests_html import AsyncHTMLSession

async def get_link(link, results):
    r = await asession.get(link)
    f = {link: r}
    results[link] = (r.url, r)
    #return f

asession = AsyncHTMLSession()

links = [
    'https://google.com',
    'https://yahoo.com',
    'https://python.org'
]

results = {}

links = [
    functools.partial(get_link, link, results) for link in links
]

print(links)
print(results)

asession.run(*links)

print(results)

the-moog avatar Apr 09 '22 22:04 the-moog