crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

deployed crawl4AI tool not working - exception=AttributeError('`copy` is not supported.')>

Open makispl opened this issue 1 year ago • 7 comments

Hi everyone,

I've made a custom scraping tool which incorporates craw4AI and use it in a crew of agents (crewAI). The crew runs perfectly locally and the custom_tool.py scrapes efficiently. However, when deploying the crew in the crewai+ enterprise platform, the tool does not work properly and not return scraped data at all.

The respective agent's output is: "Unfortunately, I encountered persistent issues when attempting to use the Crawl4AI Crawler tool on all provided competitor URLs. As a result, I was unable to scrape the pricing plan data…" which means that the 'custom_tool.py' does not work as it is supposed to.

From the enterprise logs, all I can get is:

future: <Task finished name='Task-402' coro=<Connection.run() done, defined at /usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py:272> exception=AttributeError('`copy` is not supported.')>

File "/usr/local/lib/python3.12/site-packages/crewai/tools/tool_usage.py", line 168, in _use

After trying to resolve it with the help of windsurf/cursor, their suggestions rely on this:

The error you're encountering on the CrewAI+ enterprise platform appears to be related to a serialization issue with the Playwright browser instance. The error AttributeError('copy' is not supported.') suggests that there's a problem with copying or serializing the browser state, which is likely happening because the enterprise platform handles processes differently than your local environment.

Could someone more experienced with similar issues help?

As always, @unclecode I'd appreciate your help on that.

makispl avatar Dec 13 '24 17:12 makispl

Hi @makispl , the issue you are seeing is related to how the enterprise platform tries to serialize the browser or page objects from Playwright. These objects cannot be copied or pickled. This usually occurs if the scraping logic or Crawl4AI usage crosses process boundaries. To fix it, keep all Crawl4AI usage and the scraping process within the same agent or process. Instead of returning browser objects or passing them around, just return the final scraped results as plain data. Adjusting your code in this way should prevent the serialization attempts that cause the error. I have not tried to use this with CrewAI, and to be honest, I haven't used CrewAI at all, so I'm not very familiar with it. Therefore, it's difficult for me to engage with this unless someone, like yourself, can work on it. I can help you somehow; if we can fix it, we could create a wrapper and make it available for people who want to use Crawl4AI in the CrewAI library.

unclecode avatar Dec 14 '24 12:12 unclecode

Thanks for the reply, @unclecode!

I implemented your suggestions (with some help from the windsurf/cascade), approaching the issue like this:

  • Kept the scraping logic entirely within the async_run method.

  • Returned the final result as a JSON string immediately after scraping.

  • Ensured no Playwright objects are passed outside the method.

However, the issue still persists. Could you please take a look at the script I just emailed you (cannot attach .py here)?

Your help is invaluable! Once we fix this, it could pave the way for effective scraping for many CrewAI users—a wrapper would be a piece of cake! 🍰

makispl avatar Dec 14 '24 13:12 makispl

@makispl I can see now. Okay, sure. I will check the email. To speed it up a little bit, please attach the file contains entire of required code to test and send it to my email. If it's not, please create a file and put everything in it so that I can open it in my VS Code and run it. Then I will see what I can do. If there are any requirements that must be installed, please create a requirements.txt file. see how it goes.

unclecode avatar Dec 15 '24 09:12 unclecode

@unclecode in order to verify if it works in the crewai platform, each time I modify the script, I then have to deploy it there (crewai platform) and by using its API (via my app) I check if data is scraped or not. I don't have any access to the platform's log files, except for the following snippet crewai team shared with me:

future: <Task finished name='Task-402' coro=<Connection.run() done, defined at /usr/local/lib/python3.12/site-packages/playwright/_impl/_connection.py:272> exception=AttributeError('`copy` is not supported.')>

File "/usr/local/lib/python3.12/site-packages/crewai/tools/tool_usage.py", line 168, in _use

So, running the tool locally (as is) it works well. This is why I wanted your experienced view on that, in order you check things that might be responsible for this bad behaviour (serialization, usage that crosses process boundaries etc). That said, if you want to check locally the custom_tool.py I emailed you, you can either:

  1. Check it directly like:
from dotenv import load_dotenv
from custom_tool import Crawl4AITool

def main():
    # Load environment variables from .env file
    load_dotenv()
    
    # Create an instance of the competitor detection tool
    scraper_tool = Crawl4AITool()
    
    # Test URL
    test_url = "https://www.schedulethreads.com/pricing"
    
    try:
        # Run the competitor detection
        result = scraper_tool._run(test_url)
        print("\nTiers' Scraping Results:")
        print(result)
    except Exception as e:
        print(f"Error occurred: {str(e)}")

if __name__ == "__main__":
    main()
  1. Check it along with the whole crew of agents (crewai), by installing crewai etc. I don't think it would be useful to do so, as I have already checked that and the tool runs also well within the crewai framework, but locally. In addition, I confirm that the tool is always used within only one agent.

Please let me know if can provide you with any additional information.

makispl avatar Dec 15 '24 14:12 makispl

@makispl I will try to run this locally. If I can't, I will let you know. If I can, I will start fresh, open a CrewAI project, and use Crawl4ai as a custom tool. Perhaps this will help me better take over the issue.

unclecode avatar Dec 16 '24 07:12 unclecode

Exactly! But please keep in mind that most probably the custom tool will run perfectly with crewai locally and supposedly you won’t see any issues. Those issues only appear when the crew is deployed in the crewAI platform.

makispl avatar Dec 16 '24 07:12 makispl

@makispl Such a challenge haha, ok we will see

unclecode avatar Dec 16 '24 09:12 unclecode

Crewai team resolved it and the custom tools with crawl4ai now play smoothly on the enterprise platform!

makispl avatar Jan 25 '25 14:01 makispl