Reddit scraper returns no submissions before 2022-11-03
Describe the bug
When I scrape reddits for November 2022 onward, I get 10 fields: [_type', 'author', 'body', 'date', 'id', 'parentId', 'subreddit', 'url', 'link', 'selftext', 'title']
When I scrape reddits before November 2022 (more than 3 months ago), I only get 7 fields: ['_type', 'author', 'body', 'date', 'id', 'parentId', 'subreddit', 'url']
I am missing the last 3 fields for reddits that date before November 2022: "title", "selftext", and "link"
When I scraped reddits last year (in early December), I got all 10 fields even for reddits from 10 years ago.
How to reproduce
October 2022 In:
!snscrape --json --progress reddit-search --before 1667275199 --after 1664596800 Searchterm > October.json
df = pd.read_json("October.json", lines=True)
df.columns.tolist()
Out: ['_type', 'author', 'body', 'date', 'id', 'parentId', 'subreddit', 'url']
Nov 2022 In:
!snscrape --json --progress reddit-search --before 1669870799 --after 1667275200 Searchterm > November.json
df = pd.read_json("November.json", lines=True)
df.columns.tolist()
Out: ['_type', 'author', 'body', 'date', 'id', 'parentId', 'subreddit', 'url', 'link', 'selftext', 'title']
Expected behavior
All 10 fields are scraped regardless of reddit date.
Screenshots and recordings
No response
OS / Distro
Ubuntu 18.04.6 LTS
Output from snscrape --version
snscrape 0.5.0.20230113
Scraper
reddit-search
Backtrace
No response
Dump of locals
No response
How are you using snscrape?
CLI
Additional context
Called from python notebook. Same outcome if called directly from command line.
The Reddit scrapers return submissions and comments unless instructed otherwise. The absence of those fields means that there are no submissions in the time frame (since they aren't present on comments).
Pushshift recently migrated to a new server and software stack. The old data apparently still hasn't been loaded into the new system, so submissions before 3 November are currently missing. This is mentioned e.g. here. There's nothing snscrape can do about this.