google-groups-php-api icon indicating copy to clipboard operation
google-groups-php-api copied to clipboard

Feature request: support export of archive content

Open nxg opened this issue 14 years ago • 7 comments

It's the obvious feature request:

If you were able to provide a way of exporting the content of a google group -- just the dump of all the messages, nothing fancy -- I'm sure you would be a very popular person, worldwide!

(why is it so hard...!?)

Norman

nxg avatar Nov 22 '10 00:11 nxg

Thanks for the request!

I think this would be a bit harder to pull off. Or would have to be used with caution. It would basically pound their servers to death if you were exporting tons of messages because every message would involve a separate page-load (or 2 or 3 if there were multiple pages for a given thread).

What formats would you like to see it in? Which details do you need about each post? Do you want the posts threaded the way google groups threads them?

blanchardjeremy avatar Nov 24 '10 01:11 blanchardjeremy

I would imagine this being used for occasional archiving dumps of groups, as a slightly paranoid backup, perhaps; or because a group has served its purpose and is being shut down; or because one wants to move a mailing list to a different service, and transfer the history from Google Groups.

For this sort of occasional use, it would be OK to throttle the process, retrieving only a message per second, or every few seconds.

This is the sort of case that I'd imagine being handled by dataliberation.org, but there's no mention of Groups on the list of Google products there. If they have this sort of feature on their roadmap, that would be ideal, but they don't publish a roadmap (intelligibly).

nxg avatar Nov 24 '10 09:11 nxg

dataliberation.org looks awesome. Thanks for that reference.

What format should the data be exported in? RSS? Atom? mbox (I'm not familiar with it)?

Does anyone know what format is popular for this kind of export?

blanchardjeremy avatar Nov 25 '10 01:11 blanchardjeremy

Any format would work. Atom or RSS would be nifty, but plain old mbox is the no-frills format into which I'd probably convert a feed for archiving.

mbox http://en.wikipedia.org/wiki/Mbox is a semi-standard. It's what mailers usually write out if they're asked to 'save raw email message' or something like that.

nxg avatar Nov 25 '10 09:11 nxg

Hmm. Okay. I'd also to investigate storing the threading of messages rather than just the flat messages. :)

blanchardjeremy avatar Dec 02 '10 03:12 blanchardjeremy

Remove util.php requirement in basic_tests. Closed by 2f2d02962829dfe1b356b8813f82a0a93188cdea.

blanchardjeremy avatar Jan 06 '11 10:01 blanchardjeremy

Oops. didn't mean to close this. Sorry!

blanchardjeremy avatar Jan 06 '11 10:01 blanchardjeremy