YahooGroups-Archiver icon indicating copy to clipboard operation
YahooGroups-Archiver copied to clipboard

Rewrite the archiver

Open daniel-j-born opened this issue 6 years ago • 4 comments

  • Upgrade to Python 3.
  • Reuse a requests.Session() object to reuse connections to increase archiving speed and avoid being spammed by Yahoo.
  • Pause between requests and exponentially backoff on errors to avoid being spammed by Yahoo.
  • Change user-agent to avoid being spammed by Yahoo.
  • Write messages to groupName/year/month/msgid.json instead of groupName/msgid.json.
  • Write to a tmp file and then rename into place to ensure no data corruption.
  • Create output directory in current directory rather than source code directory.
  • Change logging to Python logging.
  • move_to_year_month_dirs.py: New script to rename groupName/msgid.json to groupName/year/month/msgid.json.

daniel-j-born avatar Apr 29 '19 17:04 daniel-j-born

Hi Dan,

Wow! This looks fantastic - this script had been getting a bit neglected and unloved as of late, and was written very quickly without classes and other "proper programming" stuff. Thanks hugely for taking the time to give it a rewrite.

Right now I'm in the middle of University exams, so it may be a few weeks before I'm able to review this fully and merge the PR - sorry about that. (If there's a specific reason why you want it merged ASAP, then let me know and I can probably just merge it in after briefly looking over the code, but otherwise I suspect it'll be May 16th at the earliest).

My only point of potential concern is the move from Python 2&3 to Python 3 only. I know it's old and Python 3 has been out for ages, but 2 is still widely used and available (for instance, I think all Macs still ship with Python 2 by default). Is there a specific reason why support for Python 2 has been dropped? Or would it be possible to add in a fallback so that Python 2 works as well?

Andrew

andrewferguson avatar May 01 '19 20:05 andrewferguson

I'm glad I could help. Thanks for writing the original, and documenting the Yahoo API, which is hard to find.

I'm in no rush to have the PR merged.

I made it Python3-only for simplicity. I didn't know that Macs ship with Python2, and it's a relatively small script, so I think we can maintain Python2 compatibility without much effort. I'll look into that.

daniel-j-born avatar May 01 '19 21:05 daniel-j-born

I've extended these changes even further, including downloading attachments to messages. Since this PR is still in progress, and I based my changes on Daniel's work, I first submitted my changes against his repository here: https://github.com/daniel-j-born/YahooGroups-Archiver/pull/1

ex-nerd avatar Sep 17 '19 08:09 ex-nerd

Note to readers: As of today (21 Oct 2019) the most-up-to-date version of this script appears to be the version here: https://github.com/jam01/YahooGroups-Archiver

cc @jam01

TJC avatar Oct 21 '19 01:10 TJC