twitter-archive-parser
twitter-archive-parser copied to clipboard
[Suggestion] Creation of a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top)
I would prefer to have a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top).
This also allows a quick search using the browser's search function.
See also this example of https://github.com/Webklex/tbm (Twitter Bookmark Manager)

We had a single html file with all tweets before. For users with many tweets, it was impossible to open the file with a browser, see #103.
Even with a realtively small number of 3000 tweets, it was not completely broken, but slow, as you can see in this comment. Therefor I think, separate HTML files for each month are a good default.
For users with (much) less then 3000 tweets, maybe this would be a good option? I think this would be a nice addition, but probably less important than the other things we're currently working on.
What do you think would be the maximum number of tweets that can be in one file, so that it doesn't break or slow down the browser too much?
I am working on a postprocessor.
I intentionally do not use the sed "-i" option to avoid changes on the orginal html files.
First approach creates a raw all.html
file (done):
for f in *Tweet-Archive*.html;do sed "1,/<body*/d ; /<\/body/,//d;s/<h1>Your twitter archive<\/h1>/<h3>$f<\/h3><hr>/g" $f > $f.body; done
cat $(ls -r *.body) > all.html
Next steps:
- add head/body
- add skinning, especially for images (fixed smaller size, browser scaled, because we usually do not have thumbnail images)
- ad hoc:
sed "s/<img src=\"media/<img style='width:25%' src=\"media/g" all.html > all.smallimages.html
- add jQuery for lazy loading of the images (= images within the view are fetched immediately, next images later)
- optionally later: add cache for thumbnail images and use them when needed
Basic data for my Twitter archive: all.html:
- ~ 32 MB
-
grep -c /media/tweet.ico all.html
: 82.886 = number of archived tweets - images, video:
grep -o \"media/ all.html | wc -l
: 15.166 = number of local media objects linked in all.html (9,2 GB)
Works. It takes ~ 130 seconds *) to load the all.html
including the original sized images on my slow NUC i7, 15 Watt thermal power.
*) Including browser cold start time: to avoid the chaching of the local images during this test.
@flauschzelle
Here is the final code in one file:
https://gist.github.com/Wikinaut/39b2be7a5570a6cd41181f11c2577e30#file-patch-parsed-twitter-archive-sh
Excerpt:
# Patch parsed Twitter Archiv Parser
# Postprocessor for files generated by https://github.com/timhutton/twitter-archive-parser
# init 18.12.2022
# Usage: in the parsed twitter-archive directory with the numerous *.html, run
# ./patch-parsed-twitter-archive-sh
# It creates:
# for each existing monthly (or so) *.html it creates one new *.html.body file
# patches and strips several html tags
# adds jQuery and the lazy-images loader
# adds links to the images to facilitate the original view, opens the original image in a new tab
# concatenates all *.html.body files into a single all.html file
# TODO:
# the sorting order in each monthly block is downwards (most recent tweet: at the end)
# where as the concatenated monthly files are added with the "most recent on top"
# Example: NOV 1, 2, ... 30, OCT 1, 2, ... 31
Example output with a video and an image: