Scweet icon indicating copy to clipboard operation
Scweet copied to clipboard

I don't think scrape function is working properly

Open ehsong opened this issue 2 years ago • 3 comments

I ran this in jupyter notebook:

scrape(since="2020-05-20", until="2020-05-30", from_account = handle, interval=1, headless=True, display_type="Top", save_images=False, resume=False, filter_replies=True, proximity=True)

And I ran the handle 'minjung_dal' the code runs, but says path: https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-21%20since%3A2020-05-20%20%20-filter%3Areplies&src=typed_query&lf=on

Also doesn't scrape anything for the following:

query = "Covid AND China"
data = scrape(words=[query], since="2020-02-01", until="2020-02-02", from_account = None,interval=1, 
      headless=True, display_type="Top", save_images=False, proxy = None, save_dir = 'outputs',
             resume=False, filter_replies=True, proximity=False)

Scraping on headless mode.
looking for tweets between 2020-02-01 and 2020-02-02 ...
 path : https://twitter.com/search?q=(Covid AND China)%20until%3A2020-02-02%20since%3A2020-02-01%20%20-filter%3Areplies&src=typed_query

ehsong avatar May 23 '22 17:05 ehsong

I think that it's a normal output. This user had no posts at this day (20-05-2020). The interval is set to 1 so the period of time between 20-05 and 30-05 will be divided to 10 different periods and for each one you will get the associated posts if exist.

Altimis avatar May 23 '22 17:05 Altimis

@Altimis

No, I ran this code for the period 20-05-2020 to 30-05-2020 and it returned an empty frame although this person had a tweet posted on 26-05

Scraping on headless mode.
looking for tweets between 2020-05-20 and 2020-05-21 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-21%20since%3A2020-05-20%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-21 and 2020-05-22 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-22%20since%3A2020-05-21%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-22 and 2020-05-23 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-23%20since%3A2020-05-22%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-23 and 2020-05-24 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-24%20since%3A2020-05-23%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-24 and 2020-05-25 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-25%20since%3A2020-05-24%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-25 and 2020-05-26 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-26%20since%3A2020-05-25%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-26 and 2020-05-27 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-27%20since%3A2020-05-26%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-27 and 2020-05-28 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-28%20since%3A2020-05-27%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-28 and 2020-05-29 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-29%20since%3A2020-05-28%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
looking for tweets between 2020-05-29 and 2020-05-30 ...
 path : https://twitter.com/search?q=(from%3Aminjung_dal)%20until%3A2020-05-30%20since%3A2020-05-29%20%20-filter%3Areplies&src=typed_query&lf=on
scroll  1
scroll  2
Empty DataFrame
Columns: [UserScreenName, UserName, Timestamp, Text, Embedded_text, Emojis, Comments, Likes, Retweets, Image link, Tweet URL]

ehsong avatar May 23 '22 17:05 ehsong

try to set headless=False

heis71 avatar May 29 '22 08:05 heis71