twscrape api.search limit?

Hello,

I'm currently running a scraper which should need to download every tweet, containing a particular cashtag for a month, however if I run the following code:

q = f"${ticker} since:{start_date} until:{end_date}" async for tweet in api.search(q): print(tweet.date)

and I use a range of a month, it can find like 1.3k tweets in the first two days inside the date range only, and then it stops. I'm sure however that there are many more tweets for each day of the month under consideration. What could it be?

Jan 10 '24 23:01 washednico

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.

from datetime import datetime, timedelta

def iterate_dates(since_date: str, until_date: str):
    dt = datetime.fromisoformat(since_date)
    ed = datetime.fromisoformat(until_date)
    while dt < ed:
        nd = dt + timedelta(days=1)
        yield dt.date(), nd.date()
        dt = nd


async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):
    for since, until in iterate_dates(since_date, until_date):
        q = f"${ticker} since:{since} until:{until}"
        async for tweet in api.search(q):
            yield tweet

# then use like
await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

Feb 10 '24 17:02 vladkens

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.


from datetime import datetime, timedelta



def iterate_dates(since_date: str, until_date: str):

    dt = datetime.fromisoformat(since_date)

    ed = datetime.fromisoformat(until_date)

    while dt < ed:

        nd = dt + timedelta(days=1)

        yield dt.date(), nd.date()

        dt = nd





async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):

    for since, until in iterate_dates(since_date, until_date):

        q = f"${ticker} since:{since} until:{until}"

        async for tweet in api.search(q):

            yield tweet



# then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

Feb 10 '24 21:02 washednico

Hi, @washednico. Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.
from datetime import datetime, timedelta



def iterate_dates(since_date: str, until_date: str):

    dt = datetime.fromisoformat(since_date)

    ed = datetime.fromisoformat(until_date)

    while dt < ed:

        nd = dt + timedelta(days=1)

        yield dt.date(), nd.date()

        dt = nd





async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):

    for since, until in iterate_dates(since_date, until_date):

        q = f"${ticker} since:{since} until:{until}"

        async for tweet in api.search(q):

            yield tweet



# then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")
I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

Found any solution? I am also facing same issue, It is not even grabbing 50% of the tweets.

Mar 16 '24 14:03 ritikkumarsahu

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.
from datetime import datetime, timedelta

def iterate_dates(since_date: str, until_date: str):
dt = datetime.fromisoformat(since_date)
ed = datetime.fromisoformat(until_date)
while dt < ed:
    nd = dt + timedelta(days=1)
    yield dt.date(), nd.date()
    dt = nd
async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):
for since, until in iterate_dates(since_date, until_date):
    q = f"${ticker} since:{since} until:{until}"
    async for tweet in api.search(q):
        yield tweet
then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")
I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

Found any solution? I am also facing same issue, It is not even grabbing 50% of the tweets.

Unfortunately no since I believe the problem lies on twitter's endpoint, even searching manually sometimes doesn't give any result which doesn't make sense. I've tried splitting the data-range in days, weeks and months but nothing changed. I've basically tried to randomise the day-range search but I've never been able to scrape some specific days that somehow are not working.

Mar 22 '24 00:03 washednico

twscrape twscrape copied to clipboard

api.search limit?

then use like

twscrape
twscrape copied to clipboard