Scrapegraph-ai icon indicating copy to clipboard operation
Scrapegraph-ai copied to clipboard

asyncio.run() cannot be called from a running event loop

Open kkarkos opened this issue 1 year ago • 17 comments
trafficstars

Hi there,

trying to get SmartScraperGraph running on Fast-API.

@app.post("/crawl")
async def crawl(request: Request):
    data = await request.json() 
    url = data.get('url')   

    try:  
        smart_scraper_graph =  SmartScraperGraph(
            prompt="List me all the articles",
            # also accepts a string with the already downloaded HTML code
            source=url,
            config=graph_config
        )

        result = smart_scraper_graph.run()

        print(result)

        # Access the URL field
        return result
    except Exception as e:
        print(f"Error in crawl: {e}")
        return None

Config

`graph_config = {
    "llm": {
         "model": "ollama/llama3",
        "temperature": 0,
        "format": "json",  # Ollama needs the format to be specified explicitly
        "base_url": "http://localhost:11434",  # set ollama URL arbitrarily
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434",  # set ollama URL arbitrarilyURL
    }
}`

Error:

Error in crawl: asyncio.run() cannot be called from a running event loop
/Users/konrad/Documents/Projects/product-spider/apps/service/main.py:171: RuntimeWarning: coroutine 'AsyncChromiumLoader.ascrape_playwright' was never awaited

Any idea? Thanks

kkarkos avatar May 08 '24 14:05 kkarkos

Hey there, yes Playwright uses asyncio under the hood so probably you are trying to run an asyncio routine inside another one (your async crawl method). Right now the .rnu method doeesn't include a way to handle asynchronous calls but since it is a requested feature we will add it :)

Also we will include other web driver like the one provided by Selenium

PeriniM avatar May 08 '24 14:05 PeriniM

Hey there, yes Playwright uses asyncio under the hood so probably you are trying to run an asyncio routine inside another one (your async crawl method). Right now the .rnu method doeesn't include a way to handle asynchronous calls but since it is a requested feature we will add it :)

Hi, @PeriniM. Thanks for give explanation to this error. But what happen if i don't use asyncio in my code and get the same error like in this issue: RuntimeError: asyncio.run() cannot be called from a running event loop

I try to run this code in Google Collab and get same error, here's my code: ( I actually copy it from one of your script 👍 )

""" 
Basic example of scraping pipeline using SmartScraper
"""

import os
from dotenv import load_dotenv
from scrapegraphai.utils import prettify_exec_info
from scrapegraphai.graphs import SmartScraperGraph
load_dotenv()

from google.colab import userdata
gemini_key = userdata.get('Gemini_api_key') # To access my gemini api key in Google Environment

# ************************************************
# Define the configuration for the graph
# ************************************************

graph_config = {
    "llm": {
        "api_key": gemini_key,
        "model": "gemini-pro",
    },
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the news with their description.",
    # also accepts a string with the already downloaded HTML code
    source="https://www.wired.com",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

Can you explain it why? Thank you

EDIT:

I want to give the whole error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-3-9f47dddcb03f>](https://localhost:8080/#) in <cell line: 36>()
     34 )
     35 
---> 36 result = smart_scraper_graph.run()
     37 print(result)
     38 

5 frames
[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/smart_scraper_graph.py](https://localhost:8080/#) in run(self)
    107 
    108         inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 109         self.final_state, self.execution_info = self.graph.execute(inputs)
    110 
    111         return self.final_state.get("answer", "No answer found.")

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/graphs/base_graph.py](https://localhost:8080/#) in execute(self, initial_state)
    105 
    106             with get_openai_callback() as cb:
--> 107                 result = current_node.execute(state)
    108                 node_exec_time = time.time() - curr_time
    109                 total_exec_time += node_exec_time

[/usr/local/lib/python3.10/dist-packages/scrapegraphai/nodes/fetch_node.py](https://localhost:8080/#) in execute(self, state)
     86                 )
     87 
---> 88             document = loader.load()
     89             compressed_document = [
     90                 Document(page_content=remover(str(document[0].page_content)))]

[/usr/local/lib/python3.10/dist-packages/langchain_core/document_loaders/base.py](https://localhost:8080/#) in load(self)
     27     def load(self) -> List[Document]:
     28         """Load data into Document objects."""
---> 29         return list(self.lazy_load())
     30 
     31     async def aload(self) -> List[Document]:

[/usr/local/lib/python3.10/dist-packages/langchain_community/document_loaders/chromium.py](https://localhost:8080/#) in lazy_load(self)
     74         """
     75         for url in self.urls:
---> 76             html_content = asyncio.run(self.ascrape_playwright(url))
     77             metadata = {"source": url}
     78             yield Document(page_content=html_content, metadata=metadata)

[/usr/lib/python3.10/asyncio/runners.py](https://localhost:8080/#) in run(main, debug)
     31     """
     32     if events._get_running_loop() is not None:
---> 33         raise RuntimeError(
     34             "asyncio.run() cannot be called from a running event loop")
     35 

RuntimeError: asyncio.run() cannot be called from a running event loop

Kingki19 avatar May 13 '24 04:05 Kingki19

I have the same problem

[ipython-input-3-d9d43c78117e>](https://localhost:8080/#) in <cell line: 28>()
     26 )
     27 
---> 28 result = smart_scraper_graph.run()
     29 print(result)

5 frames
[/usr/lib/python3.10/asyncio/runners.py](https://localhost:8080/#) in run(main, debug)
     31     """
     32     if events._get_running_loop() is not None:
---> 33         raise RuntimeError(
     34             "asyncio.run() cannot be called from a running event loop")
     35

Armando123x avatar May 14 '24 16:05 Armando123x

Pls update to the new version

VinciGit00 avatar May 14 '24 20:05 VinciGit00

Hello @VinciGit00 I just installed and I'm getting the same error, I'm running the example from the website, I'm using conda env and on python 3.10.14, scrapegraphai==1.2.2

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

load_dotenv()

openai_key = os.getenv("OPENAI_APIKEY")

graph_config = {
   "llm": {
      "api_key": openai_key,
      "model": "gpt-3.5-turbo",
   },
}

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

smart_scraper_graph = SmartScraperGraph(
   prompt="List me all the projects with their description.",
   # also accepts a string with the already downloaded HTML code
   source="https://perinim.github.io/projects/",
   config=graph_config
)

result = smart_scraper_graph.run()
print(result)

Shivansh-yadav13 avatar May 15 '24 15:05 Shivansh-yadav13

Hey - I ran into smth similar while trying to wrap the smart scraper graph with some fastapi endpoints - what worked for me was to wrap the whole thing with run_in_threadpool from starlette.concurrency - running version 1.2.3

me-tetr avatar May 16 '24 12:05 me-tetr

Hey - I ran into smth similar while trying to wrap the smart scraper graph with some fastapi endpoints - what worked for me was to wrap the whole thing with run_in_threadpool from starlette.concurrency - running version 1.2.3

Please give the example code

Kingki19 avatar May 16 '24 15:05 Kingki19

even I have the same error. I have tried to add the following:

import nest_asyncio nest_asyncio.apply() result = smart_scraper_graph.run()

after which I'm getting a new error:

Exception: Connection closed while reading from the driver. Please help how to resolve this.

Datarambler avatar May 17 '24 09:05 Datarambler

Encountering this issue too, while trying to run the graph from an async function (in my case a NATS event handler), I found the following workaround.

Basically it executes the asyncio event loop on another thread, but waits for the executing in the current event loop.

import asyncio
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor()

async def run_blocking_code_in_thread(blocking_func, *args, **kwargs):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(executor, blocking_func, *args, **kwargs)
    
async def your_async_method():
    smart_scraper_graph = SmartScraperGraph(
        prompt=...,
        source=...,
        config=...
    )
    result = await run_blocking_code_in_thread(smart_scraper_graph.run)

Not sure if there are any downsides using this approach, as I am fairly new to working with Python event loops. Looking forward to built-in support

philprime avatar May 19 '24 10:05 philprime

This answer solved my problem.

NILICK avatar May 19 '24 20:05 NILICK

I get this error when using this logic:

ValueError: Model provided by the configuration not supported

from scrapegraphai.graphs import SmartScraperGraph
import json
import asyncio
from loguru import logger
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor()

graph_config = {
    "llm": {
        "model": "groq/llama3-8b-8192",
        "api_key": "....",
        "temperature": 0,
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "base_url": "http://localhost:11434"
    },
    "max_results": 5,
    "format":"json"    
}

async def read_urls_from_json_async(filename="urls.json"):
    """Asynchronously read URLs from a JSON file."""
    loop = asyncio.get_event_loop()
    try:
        with open(filename, 'r') as file:
            urls = await loop.run_in_executor(executor, json.load, file)
            return urls
    except FileNotFoundError:
        print(f"Error: The file {filename} was not found.")
        return []
    except json.JSONDecodeError:
        print("Error: Failed to decode JSON.")
        return []

async def run_blocking_code_in_thread(blocking_func, *args, **kwargs):
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(executor, blocking_func, *args, **kwargs)

async def get_ad_async(url):  
    ad_scraper = SmartScraperGraph(
        prompt="Extract all relevant data in a structured JSON.",
        source=url,
        config=graph_config
    )
    ad = await run_blocking_code_in_thread(ad_scraper.run)
    if ad:
        logger.info(json.dumps(ad, indent=4))

async def main():
    urls = await read_urls_from_json_async()
    if urls:
        tasks = [get_ad_async(url.get('url')) for url in urls]
        await asyncio.gather(*tasks)
    else:
        print("No URLs to process.")

if __name__ == '__main__':
    asyncio.run(main())

alexauvray avatar May 28 '24 14:05 alexauvray

please add the all the code

VinciGit00 avatar May 28 '24 14:05 VinciGit00

please add the all the code

Updated my previous message

alexauvray avatar May 28 '24 16:05 alexauvray

please add the all the code

Any idea?

alexauvray avatar Jun 07 '24 15:06 alexauvray

I am having the same error of this thread when trying to execute the code with Azure OpenAI configuration. This is my code:

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os
from scrapegraphai.graphs import SmartScraperGraph

os.environ["AZURE_OPENAI_ENDPOINT"] = "...."
os.environ["AZURE_OPENAI_API_KEY"] = "..."

llm_model_instance = AzureChatOpenAI(
    azure_deployment="...",
    openai_api_version="...",
    temperature=0
)

embedder_model_instance = AzureOpenAIEmbeddings(
    azure_deployment="...",
    openai_api_version="...",
)

graph_config = {
    "llm": {
        "model_instance": llm_model_instance
    },
    "embeddings": {
        "model_instance": embedder_model_instance
    }
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the projects with their descriptions",
    # also accepts a string with the already downloaded HTML code
    source="https://perinim.github.io/projects",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)









File c:\Users\EESPOSG8D\Sviluppo\Python\venv\lib\site-packages\scrapegraphai\docloaders\chromium.py:105, in ChromiumLoader.lazy_load(self)
    [102](file:///C:/Users/EESPOSG8D/Sviluppo/Python/venv/lib/site-packages/scrapegraphai/docloaders/chromium.py:102) scraping_fn = getattr(self, f"ascrape_{self.backend}")
    [104](file:///C:/Users/EESPOSG8D/Sviluppo/Python/venv/lib/site-packages/scrapegraphai/docloaders/chromium.py:104) for url in self.urls:
--> [105](file:///C:/Users/EESPOSG8D/Sviluppo/Python/venv/lib/site-packages/scrapegraphai/docloaders/chromium.py:105)     html_content = asyncio.run(scraping_fn(url))
    [106](file:///C:/Users/EESPOSG8D/Sviluppo/Python/venv/lib/site-packages/scrapegraphai/docloaders/chromium.py:106)     metadata = {"source": url}
    [107](file:///C:/Users/EESPOSG8D/Sviluppo/Python/venv/lib/site-packages/scrapegraphai/docloaders/chromium.py:107)     yield Document(page_content=html_content, metadata=metadata)

File ~\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py:33, in run(main, debug)
      [9](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:9) """Execute the coroutine and return the result.
     [10](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:10) 
     [11](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:11) This function runs the passed coroutine, taking care of
   (...)
     [30](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:30)     asyncio.run(main())
     [31](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:31) """
     [32](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:32) if events._get_running_loop() is not None:
---> [33](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:33)     raise RuntimeError(
     [34](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:34)         "asyncio.run() cannot be called from a running event loop")
     [36](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:36) if not coroutines.iscoroutine(main):
     [37](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/EESPOSG8D/Sviluppo/Python/ScrapeGraphAI/~/AppData/Local/Programs/Python/Python310/lib/asyncio/runners.py:37)     raise ValueError("a coroutine was expected, got {!r}".format(main))

RuntimeError: asyncio.run() cannot be called from a running event loop

Giustino98 avatar Jun 07 '24 16:06 Giustino98

Avoid to use .ipynb can fixed it.

bensonbs avatar Jun 08 '24 12:06 bensonbs

Run this before the above code.

import nest_asyncio
nest_asyncio.apply()

AliHaider20 avatar Jun 16 '24 11:06 AliHaider20