YahooGroups-Archiver
YahooGroups-Archiver copied to clipboard
Rewrite the archiver
- Upgrade to Python 3.
- Reuse a requests.Session() object to reuse connections to increase archiving speed and avoid being spammed by Yahoo.
- Pause between requests and exponentially backoff on errors to avoid being spammed by Yahoo.
- Change user-agent to avoid being spammed by Yahoo.
- Write messages to groupName/year/month/msgid.json instead of groupName/msgid.json.
- Write to a tmp file and then rename into place to ensure no data corruption.
- Create output directory in current directory rather than source code directory.
- Change logging to Python logging.
- move_to_year_month_dirs.py: New script to rename groupName/msgid.json to groupName/year/month/msgid.json.
Hi Dan,
Wow! This looks fantastic - this script had been getting a bit neglected and unloved as of late, and was written very quickly without classes and other "proper programming" stuff. Thanks hugely for taking the time to give it a rewrite.
Right now I'm in the middle of University exams, so it may be a few weeks before I'm able to review this fully and merge the PR - sorry about that. (If there's a specific reason why you want it merged ASAP, then let me know and I can probably just merge it in after briefly looking over the code, but otherwise I suspect it'll be May 16th at the earliest).
My only point of potential concern is the move from Python 2&3 to Python 3 only. I know it's old and Python 3 has been out for ages, but 2 is still widely used and available (for instance, I think all Macs still ship with Python 2 by default). Is there a specific reason why support for Python 2 has been dropped? Or would it be possible to add in a fallback so that Python 2 works as well?
Andrew
I'm glad I could help. Thanks for writing the original, and documenting the Yahoo API, which is hard to find.
I'm in no rush to have the PR merged.
I made it Python3-only for simplicity. I didn't know that Macs ship with Python2, and it's a relatively small script, so I think we can maintain Python2 compatibility without much effort. I'll look into that.
I've extended these changes even further, including downloading attachments to messages. Since this PR is still in progress, and I based my changes on Daniel's work, I first submitted my changes against his repository here: https://github.com/daniel-j-born/YahooGroups-Archiver/pull/1
Note to readers: As of today (21 Oct 2019) the most-up-to-date version of this script appears to be the version here: https://github.com/jam01/YahooGroups-Archiver
cc @jam01