Scweet icon indicating copy to clipboard operation
Scweet copied to clipboard

Scrape not working?

Open JarJarBeatyourattitude opened this issue 2 years ago • 16 comments

I wasn't getting any results from scrape, so I tried with headless=False. I noticed that search wasn't returning any results, I assume since you need an account to search. I confirmed that the links work in my browser where I'm signed in. Will the script be fixed, or am I missing something? Thanks.

JarJarBeatyourattitude avatar Apr 21 '23 00:04 JarJarBeatyourattitude

I also encountered the same problem.

fjj-088 avatar Apr 24 '23 13:04 fjj-088

Is the same thing happening to other scrapers? Might want to keep an eye.

BradKML avatar Apr 27 '23 09:04 BradKML

It's twitter's new restriction, now you need to login before searching.

  1. call utils.init_driver to get a driver
  2. call utils.log_in to login
  3. pass driver to scrape() (Need to modify scrape() in scweet.py to use passed driver instead of init a new one)

NicerWang avatar Apr 28 '23 07:04 NicerWang

It's twitter's new restriction, now you need to login before searching.

1. call utils.init_driver to get a `driver`

2. call utils.log_in to login

3. pass `driver` to scrape()
   (**Need to modify [scrape() in scweet.py](https://github.com/Altimis/Scweet/blob/76e7086a725980dbd5cf8d46bfc27bd4c1d6816f/Scweet/scweet.py#L71)** to use passed `driver` instead of init a new one)

Can you explain a bit more on how and what are we supposed to change.

yisyed avatar Apr 29 '23 16:04 yisyed

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

NicerWang avatar Apr 29 '23 16:04 NicerWang

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

It works! Thanks.

yisyed avatar Apr 30 '23 12:04 yisyed

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

MykhailoYampolskyi avatar May 02 '23 12:05 MykhailoYampolskyi

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "[email protected]_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

  1. I have added 'env=".env"'
    data = scrape(..., env=".env")

  2. In scrape() of 'scweet.py':

def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

yisyed avatar May 02 '23 16:05 yisyed

In Your Code (Add Your Twitter Account to .env File In Advance)

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

In scrape() of scweet.py

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hi, I am new to this, could you tell where do I add .env file? Thanks

It should be in your project's folder (NOTE: the file name should be '.env').

Your '.env' should be in the format given below:

SCWEET_EMAIL = "[email protected]_"
SCWEET_PASSWORD = "_password_"
SCWEET_USERNAME = "_username_"

Below are the steps and changes I have made:

1. I have added 'env=".env"'
   `data = scrape(..., env=".env")`

2. In scrape() of 'scweet.py':
def scrape(..., env=None):    # Add this 'env=None'
    ......
    # And add this line after line (71)
    log_in(driver, env)

NOTE: My method is not robust. If you can find a better way to scrape tweets, let us know.

In scrape() of 'scweet.py': Edit this import in Line (9) and add 'log_in' from .utils import ..., log_in

yisyed avatar May 02 '23 17:05 yisyed

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

Wish-s avatar May 07 '23 13:05 Wish-s

在您的代码中(提前将您的 Twitter 帐户添加到 .env 文件中

from Scweet.scweet import scrape
from Scweet.utils import init_driver, log_in
driver = init_driver(headless=True, show_images=False, proxy="your_proxy_setting")
log_in(driver, env=".env")
data = scrape(..., driver=driver)

在 scweet.py 的 scrape() 中

def scrape(..., driver=None):
    ......
    # Remove This Line (71)
    # driver = init_driver(headless, proxy, show_images)

Hello, I am new to this too, could you tell where can I attain the "your_proxy_setting"? Thanks very much!

Try following the method I have given above. It works for me. I have kept everything the same in scrap() of scweet.py on line (71) (the proxy is 'None' by default). If it still doesn't work, let me know what's the error. Thanks.

Note: I have to restart my VScode every time I make a change in the Scweet library.

yisyed avatar May 07 '23 15:05 yisyed

@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

NicerWang avatar May 09 '23 02:05 NicerWang

@Wish-s If you do not need a proxy(or VPN) to connect to twitter.com, just remove this parameter.

Thank you for your reply. I need a a proxy(or VPN) to connect to twitter.com, but I can't find where to obtain the parameter.

Wish-s avatar May 09 '23 09:05 Wish-s

@Wish-s It's decided by your proxy software, in the format "PROTOCOL://IP:PORT". For clash, it use "http://127.0.0.1:7890" as default.

NicerWang avatar May 09 '23 11:05 NicerWang

hello guy this is my code from selenium import webdriver from selenium.webdriver.chrome.service import Service from Scweet.scweet import scrape

Specify the parameters for scraping

username = "2MInteractive" since_date = "2023-07-01" until_date = "2023-07-11" headless = True

Set up the ChromeDriver service

service = Service("C:/Users/HP Probook/Downloads/chromedriver.exe") # Replace with the actual path to chromedriver

Set up the ChromeOptions

options = webdriver.ChromeOptions() options.headless = headless

Create the WebDriver

driver = webdriver.Chrome(service=service, options=options)

Scrape the tweets by username

data = scrape(from_account=username, since=since_date, until=until_date, headless=headless, driver=driver)

Print the scraped data

print(data)

Close the WebDriver

driver.quit() and i am having empty datalist looking for tweets between 2023-07-01 and 2023-07-06 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-06%20since%3A2023-07-01%20&src=typed_query scroll 1 scroll 2 looking for tweets between 2023-07-06 and 2023-07-11 ... path : https://twitter.com/search?q=(from%3A2MInteractive)%20until%3A2023-07-11%20since%3A2023-07-06%20&src=typed_query scroll 1 scroll 2 Empty DataFrame Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL] Index: []

ihabpalamino avatar Jul 12 '23 15:07 ihabpalamino

check this solution, it might work if none of the others worked https://github.com/Altimis/Scweet/issues/169#issuecomment-1640205875

baqachadil avatar Jul 18 '23 13:07 baqachadil