twitter-scraper-selenium
twitter-scraper-selenium copied to clipboard
Not scraping every tweet from a user
Hello, I am trying to scrape every tweet from a user. From the twitter page, I can see that they have tweeted more than 5000 times. However, even when I set my tweets_count to 5000, I am getting less than 1000 tweets from that user.
My code is below:
scrape_profile(twitter_username = "elonmusk", output_format ="csv", tweets_count = 6000, browser = "chrome", filename = "elonmusk")
(Note that @elonmusk is just a stand-in example)
Hey @wjd157, that method uses browser automation for scraping and your tweet count is big so it might be getting blocked in between. I suggest you use the scrape_keyword_with_api()
method for scraping. Try the below code, and check elon.json
after scraping you will get the data you want
from twitter_scraper_selenium import scrape_keyword_with_api
scrape_keyword_with_api('from:elonmusk', output_filename='elon')
This appears to generate a JSON file with no data in it. Further, it the console tells me I have only scraped 24 tweets even though the account I am now trying has more than 200 tweets.
Okay, I think this feature of Twitter only returns few tweets. Currently, I have not added feature to scrape Twitter account from Twitter's API, and the one with the browser automation get's blocked. I will add a new feature to scrape Twitter's profile from the API in a couple of weeks
I am also highly looking forward to this feature. Please let us know once you had time to implement this. Thanks a lot.
Hi @christianmettri @wjd157 , Just updating you about it, don't know if you're still looking for the solution. Now, you can try
from twitter_scraper_selenium import scrape_profile_with_api
scrape_profile_with_api('elonmusk', output_filename='musk', tweets_count= 100)
and check musk.json
file where the output will be saved
Hello @shaikhsajid1111 I tried this code and it gives me this error:
2023-02-28 02:33:09,836 - WARNING - Failed to make request!
The code:
from twitter_scraper_selenium import scrape_profile_with_api
import json
scrape_profile_with_api(username="NASA", output_filename="NASA", browser="firefox",tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")
with open('NASA.json') as f:
NASA = json.load(f)
with open('NASAimages.html', 'w') as f:
f.write('<html>\n')
f.write('<head>\n')
f.write('<title>Imágenes</title>\n')
f.write('</head>\n')
f.write('<body>\n')
for tweet_id, tweet_data in caro.items():
if tweet_data['username'] == 'NASA':
for imagen in tweet_data['images']:
f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
f.write('</body>\n')
f.write('</html>\n')
print("HTML READY")
I also tried with the function scrape_keyword_with_api, here is the code:
from twitter_scraper_selenium import scrape_keyword_with_api
import json
scrape_keyword_with_api(query="from:NASA", output_filename="NASA", tweets_count=50, output_dir="C:/Users/Braulio/Desktop/web scraping python")
with open('NASA.json') as f:
NASA = json.load(f)
with open('imagenes.html', 'w') as f:
f.write('<html>\n')
f.write('<head>\n')
f.write('<title>Imágenes</title>\n')
f.write('</head>\n')
f.write('<body>\n')
for tweet_id, tweet_data in NASA.items():
if tweet_data['username'] == 'NASA':
for imagen in tweet_data['images']:
f.write('<img src="{}" format=jpg&name=medium" alt="">\n'.format(imagen))
f.write('</body>\n')
f.write('</html>\n')
print("HTML READY")
It shows this error:
2023-02-28 02:37:18,021 - twitter_scraper_selenium.keyword_api - WARNING - Failed to make request!