AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Cookies selected instead of page content on google'ing

Open piotr-mamenas opened this issue 2 years ago • 15 comments

Duplicates

  • [X] I have searched the existing issues

Steps to reproduce 🕹

Ask autoGPT to assemble some list (example: investors list, client list, whatever)

Current behavior 😯

When googling for data:

SYSTEM: Command browse_website returned: ("Answer gathered from website: The text does not provide information abou t the XYZ. It explains how TechCrunch uses cookies for their websites and apps, and how users can manage their privacy settings.

Pops up very often even through the website has information. The reason is the cookies popup content is selected for analysis instead of actual content of the page.

Expected behavior 🤔

autoGPT reads the page and gathers data from it

Your prompt 📝

No response

piotr-mamenas avatar Apr 15 '23 11:04 piotr-mamenas

same here, what to do about it

Timmermanzzz avatar Apr 15 '23 17:04 Timmermanzzz

same here - it doesn't seem to work

Neel738 avatar Apr 15 '23 17:04 Neel738

I added the sentence "Ignore content from cookie consent popups, ads and disclaimers." to line 131 in autogpt/processing/text.py. So far I didn't have anymore problems with cookie banners, but I haven't tested much. More often than not I ran into the "too much tokens" problem.

gpayer avatar Apr 16 '23 19:04 gpayer

An easy fix is to use a VPN and point it at a server outside of the EU (for example one in the US).

R037 avatar Apr 17 '23 09:04 R037

I'm having the same issue on nearly all websites. Whenever a cookie wall appears or even a 'subscribe to our newsletter' pop-up appears on the page, AutoGPT only reads inside those containers. How can I make AutoGPT ignore cookie walls and those other pop-ups when scraping a website?

japppie avatar Apr 17 '23 22:04 japppie

same problem

Bubble007 avatar Apr 23 '23 19:04 Bubble007

same problem but only when I use the google api

anybam avatar May 05 '23 18:05 anybam

Any update on this, I have the same issue where AutoGPT just scrapes pop up’s and doesn't read the whole page

jeffmercury avatar May 10 '23 14:05 jeffmercury

I fixed this by configuring my Selenium class to close the pop up before it reads. I had to know the pop up buttons class to close it, for other pop ups, try using the disable pop up option in the Selenium class that handles all the commands or get the pop us class and close it that way.

jeffmercury avatar May 10 '23 19:05 jeffmercury

@jeffmercury Thank you! Are you interested in a PR?

FarzanT avatar May 10 '23 21:05 FarzanT

Hi @FarzanT yes

jeffmercury avatar May 10 '23 23:05 jeffmercury

Hi @jeffmercury Can you please explain, what exactly to do for "fixed this by configuring my Selenium class to close the pop up before it reads"

Bubble007 avatar May 11 '23 13:05 Bubble007

Before trying my solution, I would like to state that a better way is to see if you can get the data you want as JSON from an API, instead of having to scrape the web for your information which introduces a lot of errors. That's what I'm doing otherwise try my solution below

This solution is not a guaranteed and requires some configuration. It will not work in Docker. There are two ways to disable pops ups. Adding the --disable-popup-blocking option to selenium or have selenium close the pop up by clicking it close button.

First update your .ENV with this : HEADLESS_BROWSER=True USE_WEB_BROWSER=chrome

First Option:

Then in your web_selenium.py find the scrape_text_with_seleniumfunction and add this :

options.add_argument("--disable-popup-blocking")

This should disable normal pop ups, however If pop-ups still persist, they might be implemented in a way that is not recognized as a standard JavaScript alert by Selenium. In this case, you would need to identify the specific HTML element of the pop-up and interact with it (e.g., click a close button).

Second Option:

Add this to the scrape_text_with_seleniumfunction() in the web_selenium.py file right after this driver.get(url) .

#If the pop up is a dialog box that cant be closed with the --disable-popup-blocking, find the close button and click it

try:
    wait = WebDriverWait(driver, 2)
    close_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#sds-dialog-0 > layout-splash-modal > div.grid-row.flex-column.tablet\:flex-row > div:nth-child(2) > div > button')))
    close_button.click()
except Exception as e:
    print(f"Error interacting with the close button: {e}")

Here we are telling selenium to wait 2 seconds then hit the pop ups close button. Use the browser tools to get the close button of your pop up CSS selector and pass it to the function above like I did.

Ensure you are using manual mode and explicitly tell the AI to use the browse_website function with questions on what to read on the page as arguments. If you get stuck, use chat gpt to help you, this is how I came up with this solution.

@Bubble007

jeffmercury avatar May 11 '23 17:05 jeffmercury

Hi @jeffmercury, thank you very much. I think. that helps many. :-)

Bubble007 avatar May 11 '23 17:05 Bubble007

I can also confirm that this is an issue. auto-gpt cannot google anything since it only sees the cookie banner.

villesau avatar May 11 '23 20:05 villesau

Confirm the issue as well

bluelancer avatar May 27 '23 21:05 bluelancer

same here, seems like there is no easy solution available exept for moving the server...

j0schi avatar Jun 02 '23 02:06 j0schi

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] avatar Sep 06 '23 21:09 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

github-actions[bot] avatar Sep 18 '23 01:09 github-actions[bot]