twitter-archive-parser [Suggestion] Creation of a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top)

I would prefer to have a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top).

This also allows a quick search using the browser's search function.

See also this example of https://github.com/Webklex/tbm (Twitter Bookmark Manager)

Nov 26 '22 11:11 Wikinaut

We had a single html file with all tweets before. For users with many tweets, it was impossible to open the file with a browser, see #103.

Even with a realtively small number of 3000 tweets, it was not completely broken, but slow, as you can see in this comment. Therefor I think, separate HTML files for each month are a good default.

For users with (much) less then 3000 tweets, maybe this would be a good option? I think this would be a nice addition, but probably less important than the other things we're currently working on.

Nov 26 '22 16:11 lenaschimmel

What do you think would be the maximum number of tweets that can be in one file, so that it doesn't break or slow down the browser too much?

Nov 27 '22 13:11 flauschzelle

I am working on a postprocessor.

I intentionally do not use the sed "-i" option to avoid changes on the orginal html files.

First approach creates a raw all.html file (done):

for f in *Tweet-Archive*.html;do sed "1,/<body*/d ; /<\/body/,//d;s/<h1>Your twitter archive<\/h1>/<h3>$f<\/h3><hr>/g" $f > $f.body; done
cat $(ls -r *.body) > all.html

Next steps:

add head/body
add skinning, especially for images (fixed smaller size, browser scaled, because we usually do not have thumbnail images)
ad hoc: sed "s/<img src=\"media/<img style='width:25%' src=\"media/g" all.html > all.smallimages.html
add jQuery for lazy loading of the images (= images within the view are fetched immediately, next images later)
optionally later: add cache for thumbnail images and use them when needed

Basic data for my Twitter archive: all.html:

~ 32 MB
grep -c /media/tweet.ico all.html : 82.886 = number of archived tweets
images, video: grep -o \"media/ all.html | wc -l : 15.166 = number of local media objects linked in all.html (9,2 GB)

Works. It takes ~ 130 seconds *) to load the all.html including the original sized images on my slow NUC i7, 15 Watt thermal power.

*) Including browser cold start time: to avoid the chaching of the local images during this test.

Dec 17 '22 21:12 Wikinaut

@flauschzelle

Here is the final code in one file:

https://gist.github.com/Wikinaut/39b2be7a5570a6cd41181f11c2577e30#file-patch-parsed-twitter-archive-sh

Excerpt:

# Patch parsed Twitter Archiv Parser
# Postprocessor for files generated by https://github.com/timhutton/twitter-archive-parser
# init 18.12.2022

# Usage: in the parsed twitter-archive directory with the numerous *.html, run
# ./patch-parsed-twitter-archive-sh

# It creates:
# for each existing monthly (or so) *.html it creates one new *.html.body file
# patches and strips several html tags
# adds jQuery and the lazy-images loader
# adds links to the images to facilitate the original view, opens the original image in a new tab 
# concatenates all *.html.body files into a single all.html file

# TODO:
# the sorting order in each monthly block is downwards (most recent tweet: at the end)
# where as the concatenated monthly files are added with the "most recent on top"
# Example: NOV 1, 2, ... 30, OCT 1, 2, ... 31

Example output with a video and an image: grafik

Dec 18 '22 18:12 Wikinaut

twitter-archive-parser twitter-archive-parser copied to clipboard

[Suggestion] Creation of a single timeline-styled index.html file comprising all tweets in sorted order (most recent on top)

twitter-archive-parser
twitter-archive-parser copied to clipboard