twint
twint copied to clipboard
[!] No more data! Scraping will stop now. (The script scraped data once and won't scrape again)
### Command Ran import twint
Configuration
config = twint.Config() config.Search = "#beyondmeat" config.Location = True config.Store_csv = True config.Output = "Tweets.csv"
running search
twint.run.Search(config)
### Description of Issue I'm new to using Twint and relatively new to python as well. I'm scraping twitter for a final project in class. The initial script worked and scraped about 28,000 tweets. I added one more line to the code to also get any location information if available and ran the script again. The script keeps returning the following message and no tweets returned:
[!] No more data! Scraping will stop now. found 0 deleted tweets in this search.
I'm not looking for more data, just would like the same tweets again scraped with location data where available - the area information is good and not looking for exact lat/long. The rest of my project depends on this information.
I even tried running a completely different scraping script from a reply from one of the other git issues. I get the same 'No more data!' message. Here's the script:
import twint
c = twint.Config() c.Search = "#nfl" c.Debug = True c.Location = True c.Resume = "test_1.session" c.Since = "2019-12-18" c.Until = "2019-12-19" c.Store_csv = True c.Output = "test_1.csv"
twint.run.Search(c)
I thought it might be because Twitter has timed me out due to scraping too much data too fast so I ran these scripts with different time intervals multiple times - varying intervals from 5 minutes to 5 hours in between. The same message is returned.
Why am I no longer getting any tweets regardless of what script I use? Is my IP blocked by Twitter for scraping? How long does your IP get blocked? I've waited as long as 5 hours before running the script again. Is there a way to use Proxy IP's or something in combination with Twint? If so, any recommendations on what libraries or how to use them with Twint would be very helpful!!
Thank you in advance to anyone who can shed some light into this!!
>Checked the following and is true:
- [] Python version is 3.6;
- [] Updated Twint with
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
; - [] I have searched the issues and there are no duplicates of this issue/question/request.
### Environment Details
Windows 10, Python 3.6.9 - conda environment, running in PyCharm
Hey, not sure about the exact issue but maybe you could try using the terminal commands and check if twint is still able to retrieve? I've pulled tweets more than 28000 so doubt if it's an IP block.
I think twint has a bug where it stops looking for tweets if it doesn't find tweets for a specific time period. Maybe you could alter the time periods and write a script to loop through another time if it doesn't find any? Also, I think the geolocation part isn't working anymore but I did notice in the requirement section that it does require the geopy library installed. Perhaps you could give that a shot if you pip installed twint..
Running the below commands would install all the requirements for twint as well. git clone --depth=1 https://github.com/twintproject/twint.git cd twint pip3 install . -r requirements.txt
@noelmathews Thank you for your response! I have run the requirements, geopy is installed, tried various time frames, tried various IP's, tried as python script as well as from terminal. I keep getting one error or another. I even tried upgrading with a couple of the new twint packages with some bug fixes. Everything throws a series of different errors and I think I've tried most of the troubleshooting suggestions on github, and they all lead to another new error.
Did you recently get twint to work for you? If so, do you mind sharing a sample code?
Oh I see. From the other issue posts, it does seem like the geolocation isn't working anymore but as for the tweets.. Could you cd into the twint downloaded directory or open the terminal at that folder and type in this?
twint -s beyondmeat -o tweet.json --json --since 2020-01-01 [No python just CLI commands]
I was able to retrieve allot of tweets..
Thank you for the prompt reply noel!! I tried your code in CLI and I get the following error: CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
one of the few errors I keep getting. One of the common suggestions for this problem is to upgrade to the following: pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
This doesn't work for me either. Any thoughts?
Hmmm, I think this has to do with anaconda conflicts..
Perhaps, remove anaconda and install the latest version of python without anaconda? Link : https://www.python.org/downloads/windows/
Then point the environment towards the newly installed python then do a fresh git pull of twint without pip
git clone --depth=1 https://github.com/twintproject/twint.git cd twint pip3 install . -r requirements.txt
cd into the twint folder and try again?
@noelmathews thanks a lot for your suggestion! It worked :) Just couldn't use it in a python environment but totally works on CLI. However, the --location option is still failing whether in python or command line. Hope someone can look into that soon.
Thank you once again for your help!!
Glad it worked. For the location, I think not all tweets would have geolocation details but you could maybe attempt to write a scrapper to collect user location if available from the profile of the already collected tweets using the username of sorts... (Then share it on github so that others can try it out too 😅)
have anyone found how the Location is able to work?
This is only kind of related, but Is there a reason why the CLI sometimes works way better than the module? When trying to pull tweets from an account that changed usernames, the module is only giving me tweets since the username change (and even then I have to use the user id or it doesn't work). If I use the CLI, I can get all the tweets since the account was created (using the current username too)
Same problem as @patrickHD with changed usernames. CLI works, module doesn't.
Edit: If I run twint -u newUsername
I only get tweets since the username was changed. If additionaly the parameter --retweets
is used not only do i get the retweets but also all older tweets from when before the username was changed. In module even if config.Retweets
is set to True I only get the same output as executing twint -u newUsername
.
I get OP's issue using module, but without using c.Location = True. Running this cmmand: c = twint.Config() c.Username = "USER" c.Store_object = True c.Store_json = True c.Output = "USER.json" c.Resume = "USER_json_resume.txt" twint.run.Search(c)
It will scrape once through, but every subsequent scrape does not resume properly. It just results: [!] No more data! Scraping will stop now. found 0 deleted tweets in this search.
If I run the scraper without c.Resume, it pulls all tweets as expected. The log does one line each time I run this, but it's the same seek ID as the last successful request on the first run. It never captures subsequent tweets.
Workaround for now is to use c.Since into a new file.
-- EDIT: tested in terminal, and same thing. It's not picking up new posts.
Have same issue as above poster. I'm using Since and Until but cannot do subsequent scrapes unless I leave the PC idle for about 5 minutes.
Try workaround here, which worked for me:
https://github.com/twintproject/twint/issues/1253#issuecomment-913055717
#1253 (comment) This does work for linux
The problem seems to arise when we use "c.Until"....Works fine without it.