export-saved-reddit
export-saved-reddit copied to clipboard
Comments not saved, only links
The current version only saves link URLs and their titles, but saved comments are lost. However, I was able to dive into the code and fix it for my use. Here are the changes I made in order to also grab author, submission body, and a more readable time. I'm not sure if this is the right place to put this as I'm pretty new to github and coding in general, but the git diff is below:
diff --git a/export_saved.py b/export_saved.py
index 7e4f5db..0e88d5f 100755
--- a/export_saved.py
+++ b/export_saved.py
@@ -11,6 +11,7 @@ import argparse
import csv
import logging
import sys
+import datetime
import praw
@@ -212,19 +213,33 @@ def get_csv_rows(reddit, seq):
created = int(i.created)
except ValueError:
created = 0
+
+ createdreadable = datetime.datetime.fromtimestamp(int(created)).strftime('%Y-%m-%d %H:%M:%S')
try:
folder = str(i.subreddit).encode('utf-8').decode('utf-8')
except AttributeError:
folder = "None"
+
+ try:
+ body = "N/A"
+ body = str(i.body).encode('utf-8').decode('utf-8')
+ except AttributeError:
+ body = "N/A"
+ try:
+ author = "N/A"
+ author = str(i.author).encode('utf-8').decode('utf-8')
+ except AttributeError:
+ author = "N/A"
+
if callable(i.permalink):
permalink = i.permalink()
else:
permalink = i.permalink
permalink = permalink.encode('utf-8').decode('utf-8')
- csv_rows.append([reddit_url + permalink, title, created, None, folder])
+ csv_rows.append([reddit_url + permalink, title, created, createdreadable, body, author, None, folder])
return csv_rows
@@ -239,7 +254,7 @@ def write_csv(csv_rows, file_name=None):
file_name = file_name if file_name is not None else 'export-saved.csv'
# csv setting
- csv_fields = ['URL', 'Title', 'Created', 'Selection', 'Folder']
+ csv_fields = ['URL', 'Submission Title', 'Created-UNIX', 'Created-Standard', 'Body', 'Username', 'Selection', 'Folder']
delimiter = ','
# write csv using csv module
Is this patch python3 compatible? I keep getting an error: UnboundLocalError: local variable 'createdreadable' referenced before assignment
can you post complete eror @goose-ws ?
The problem was that I had createdreadable = datetime.datetime.fromtimestamp(int(created)).strftime('%Y-%m-%d %H:%M:%S')
indented one too many times.
I wasn't able to get this to work. When I run python export_saved.py
after making the changes, the process just hangs in my bash until I keyboard interrupt.
I added a quick bash script in #52 that will parse this info out of the resultant HTML file after the python script completes, in case anyone else wants a more comprehensive backup and doesn't mind a less elegant solution/somewhat sloppily formatted data
I wasn't able to get this to work. When I run
python export_saved.py
after making the changes, the process just hangs in my bash until I keyboard interrupt.
With those changes made by samschr, it just takes a lot longer to extract the data. If you set the logging level to INFO, you'll see that the program is churning out the data sets.