twscrape icon indicating copy to clipboard operation
twscrape copied to clipboard

api.search limit?

Open washednico opened this issue 1 year ago • 4 comments

Hello,

I'm currently running a scraper which should need to download every tweet, containing a particular cashtag for a month, however if I run the following code:

q = f"${ticker} since:{start_date} until:{end_date}" async for tweet in api.search(q): print(tweet.date)

and I use a range of a month, it can find like 1.3k tweets in the first two days inside the date range only, and then it stops. I'm sure however that there are many more tweets for each day of the month under consideration. What could it be?

washednico avatar Jan 10 '24 23:01 washednico

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.

from datetime import datetime, timedelta

def iterate_dates(since_date: str, until_date: str):
    dt = datetime.fromisoformat(since_date)
    ed = datetime.fromisoformat(until_date)
    while dt < ed:
        nd = dt + timedelta(days=1)
        yield dt.date(), nd.date()
        dt = nd


async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):
    for since, until in iterate_dates(since_date, until_date):
        q = f"${ticker} since:{since} until:{until}"
        async for tweet in api.search(q):
            yield tweet

# then use like
await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

vladkens avatar Feb 10 '24 17:02 vladkens

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.


from datetime import datetime, timedelta



def iterate_dates(since_date: str, until_date: str):

    dt = datetime.fromisoformat(since_date)

    ed = datetime.fromisoformat(until_date)

    while dt < ed:

        nd = dt + timedelta(days=1)

        yield dt.date(), nd.date()

        dt = nd





async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):

    for since, until in iterate_dates(since_date, until_date):

        q = f"${ticker} since:{since} until:{until}"

        async for tweet in api.search(q):

            yield tweet



# then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

washednico avatar Feb 10 '24 21:02 washednico

Hi, @washednico. Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.

from datetime import datetime, timedelta



def iterate_dates(since_date: str, until_date: str):

    dt = datetime.fromisoformat(since_date)

    ed = datetime.fromisoformat(until_date)

    while dt < ed:

        nd = dt + timedelta(days=1)

        yield dt.date(), nd.date()

        dt = nd





async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):

    for since, until in iterate_dates(since_date, until_date):

        q = f"${ticker} since:{since} until:{until}"

        async for tweet in api.search(q):

            yield tweet



# then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

Found any solution? I am also facing same issue, It is not even grabbing 50% of the tweets.

ritikkumarsahu avatar Mar 16 '24 14:03 ritikkumarsahu

Hi, @washednico.

Hard to say why you have only 1.3k tweets. You can use less granularity to achive better results.

from datetime import datetime, timedelta

def iterate_dates(since_date: str, until_date: str):

dt = datetime.fromisoformat(since_date)
ed = datetime.fromisoformat(until_date)
while dt < ed:
    nd = dt + timedelta(days=1)
    yield dt.date(), nd.date()
    dt = nd

async def get_ticker_tweets(ticker: str, since_date: str, until_date: str):

for since, until in iterate_dates(since_date, until_date):
    q = f"${ticker} since:{since} until:{until}"
    async for tweet in api.search(q):
        yield tweet

then use like

await get_ticker_tweets("AAPL", "2024-01-01", "2024-01-10")

I believe it's a problem with the twitter endpoints since even if I split the search in intervals in groups of 3/4 days some days don't even contain any tweet. Which doesn't make any sense since the daily average is 500+

Found any solution? I am also facing same issue, It is not even grabbing 50% of the tweets.

Unfortunately no since I believe the problem lies on twitter's endpoint, even searching manually sometimes doesn't give any result which doesn't make sense. I've tried splitting the data-range in days, weeks and months but nothing changed. I've basically tried to randomise the day-range search but I've never been able to scrape some specific days that somehow are not working.

washednico avatar Mar 22 '24 00:03 washednico