PushshiftDumps
PushshiftDumps copied to clipboard
Example scripts for the pushshift dump files
This repo contains example python scripts for processing the reddit dump files created by pushshift. The files can be downloaded from here or torrented from here.
single_file.pydecompresses and iterates over a single zst compressed fileiterate_folder.pydoes the same, but for all files in a foldercombine_folder_multiprocess.pyuses separate processes to iterate over multiple files in parallel, writing lines that match the criteria passed in to text files, then combining them into a final zst compressed file