pyppeteer icon indicating copy to clipboard operation
pyppeteer copied to clipboard

Ability to use without asyncio

Open caffeinatedMike opened this issue 6 years ago • 13 comments

I would love to see the ability to use this tool without needing to worry about async/await logic that IMO makes code less readable. I understand there are reasons for others to want to make use of the concurrent connections, but for many tasks it's simply unneeded.

caffeinatedMike avatar Feb 21 '20 15:02 caffeinatedMike

I somewhat agree that it would be simpler for short scripts but async behaviour is pretty vital for the project as things would get really slow without async nature.

It might be worth taking a look some ways to wrap basic functionality for such scripting but I'm not sure if saving 2 lines of code for loop intialization and few keyword prefixes is good enough of a reason.

Granitosaurus avatar Feb 22 '20 12:02 Granitosaurus

@Granitosaurus thanks for considering my request and use-case.

One thing I've found to be a big issue with the required async functionality is the module ocassionally gets "stuck" in an await statement indefinitely.

I'm not well-versed with how asyncio functions operate, but I guess I figured that it was responsible for the hang-up and that it would simply error-out as expected if it were not taking place in the async thread. Hence, another reason prompting the request. If that makes any sense?

caffeinatedMike avatar Feb 22 '20 20:02 caffeinatedMike

If you experience a hang while awaiting something in your code, moving the awaits upstream to pyppeteer2's code will probably not fix the issue, ideally, you'd get to the root of the problem instead.

Interestingly, the author of the now-apparently-defunct version of this very library has what looks like a solution to this: syncer

I could probably cook up a PR if this has enough interest, however, my intuition on this one tells me that this could result in a ton of nasty bugs.

Mattwmaster58 avatar Feb 23 '20 07:02 Mattwmaster58

module ocassionally gets "stuck" in an await statement indefinitely

That is outdated project's fault. There are few patches to resolves this in dev branch already but you'll have to wait for pup2.2.1 puppeteer 2.2.1 branch to be finished and merged in.

@Mattwmaster58 thanks for a great proposal but this should probably wait for 2.2.1; syncer is already used in our tests though and it has been doing it's job! It does seem like a bit of work dev/maintainance wise so maybe interest should be gauged first as well.

Reactions to original post: thumbs up/down are welcome to help with evaluating whether this is worth pursuing!

Granitosaurus avatar Feb 23 '20 09:02 Granitosaurus

Interesting. A sync implementation of pyppeteer2 could be useful to make tests a bit simpler if we decide to implement it, however it's a lot simpler at the moment to simply write the tests async with a @sync decorator.

Mattwmaster58 avatar Feb 23 '20 19:02 Mattwmaster58

I'm not sure if this question was answered in this thread, but why does pyppeteer only work along side with the Asyncio Library? or can it work without it? What is its purpose?

jshernandez017 avatar Jun 10 '20 03:06 jshernandez017

Since it's a port of puppeteer, it is by nature very asynchronous code. Async allows for non-blocking network IO, which, when you are dealing with a library that is a wrapper around sending CDP commands over the network, is a huge boon.

Mattwmaster58 avatar Jun 10 '20 03:06 Mattwmaster58

As far as I understand, asyncio has implications beyond aesthetics / readability. Similar to this issue, I encountered a case where a pyppeteer Browser object needs to be initialised by an application that has is own asyncio loop.

The problem is that the method to start a chrome process and return a Browser object is an async method.

It seems that this is a case where it would be "necessary to invoke the asyncio event loop from code that was itself invoked within (although usually not directly by) the event loop", a rather fundamental issue with asyncio itself.

I think it would be great to be able to use core functionality such as initialising a Browser object from external applications, irrespective of whether the external application uses asyncio or not.

ingo-m avatar Sep 02 '20 00:09 ingo-m

gets "stuck" in an await statement indefinitely.

I'm getting this issue too with page.evaluate(), what's the workaround for now? EDIT: tried with both DEV and MASTER branch

result = await page.evaluate("""
        () => {
            return {
                height: document.body.scrollHeight,
            }
        }
""")

Given enough iterations, this statement is always getting stuck for me. Would be nice to have asyncio-free option...

AndreiRegiani avatar Dec 04 '20 12:12 AndreiRegiani

@AndreiRegiani

asyncio is not the problem here, this is likely a bug in pyppeteer. The design of the Chrome DevTools Protocol necessitates an asynchronous library design. Any synchronous interface provided by pyppeteer would be necessarily wrapping asynchronous code, which wouldn't help in your situation.

I would try your code installing from the pup2.1.1 branch to see if things work.

Mattwmaster58 avatar Dec 04 '20 16:12 Mattwmaster58

Perhaps we could take a page out is playwright-python's book as they offer exactly this.

Mattwmaster58 avatar Dec 31 '20 02:12 Mattwmaster58

I had a need to load a page and run its Javascript from a data pipeline task and pyppeteer hit the spot.

Could a sync example along these lines be added to the docs?

I was getting "signal only works in main thread" errors until I explicitly disabled these in the browser launch. My sample code is below.

import asyncio
from pyppeteer import launch

...

async def run_js_page(url: str):
    browser = await launch(
        headless=True,
        handleSIGINT=False,
        handleSIGTERM=False,
        handleSIGHUP=False,
    )
    page = await browser.newPage()
    await page.goto(url)
    content = await page.content()
    await browser.close()
    return content

...

@task
def my_synchronous_task():
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    result = loop.run_until_complete(run_js_page(reload_path))

mattyweb avatar Mar 22 '21 19:03 mattyweb