twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

Feature request: Export unshortened URLs to CSV (e.g. for archiving them in the Internet Archive)

Open jpluimers opened this issue 3 years ago • 6 comments

Michele Weigle has a nice thread on archiving t.co links into the Internet Archive running an Internet Archive service that archives URLs in a Google Sheets spreadsheet: https://twitter.com/weiglemc/status/1593698822257102851

Her script prepares the list of t.co URLs using awk: https://gist.github.com/weiglemc/312a11356420b3bc4c8e196e8454002a

The idea from that script might be a thing you want to include in your Python script.

jpluimers avatar Nov 20 '22 19:11 jpluimers

Related: https://github.com/timhutton/twitter-archive-parser/pull/42 (via https://github.com/timhutton/twitter-archive-parser/issues/38#issuecomment-1312829030)

jpluimers avatar Nov 20 '22 19:11 jpluimers

@jpluimers We do this already, in parse_tweets(). #42 is then taking the next step, which is to make remote calls to t.co (wp.me, etc.) to retrieve the expanded URLs directly for the ones we couldn't find in the archive.

timhutton avatar Nov 24 '22 00:11 timhutton

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

timhutton avatar Nov 24 '22 10:11 timhutton

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

That would be cool. What's the best way to rephrase this issue to reflect that intent better?

jpluimers avatar Nov 28 '22 17:11 jpluimers

What I mean is, we already get the expanded versions from the JSON. We don't push the mappings to the internet archive, though I can see that that is a useful service to humanity and something we could support.

Yeah if there could be some sort of "export to Google Sheet" option to make the step outlined in the 3rd tweet of the thread linked in the OP easier, I would find that useful: https://twitter.com/weiglemc/status/1593698828171067393

cooljeanius avatar Nov 29 '22 05:11 cooljeanius