TelegramChatStats
TelegramChatStats copied to clipboard
Generate some statistics and plots from your exported Telegram chat data (using Bokeh plots with python 3)
Telegram Chat Statistics
Generate graphs and statistics from your exported Telegram messages.
Examples




Usage
First you need to export your Telegram data to a result.json file. You can do this in the settings of the Telegram desktop client.
./telegram-statistics.py -i result.json -n "name"
Open the file result_2019-05-30.json and parse the chat history with Name Surname starting from 2018-01-01 up to now and generate the substring plot for the emojis "๐๐๐๐๐งก๐๐๐๐๐ฅฐ"
./telegram-statistics.py -i ../result_2019-05-30.json -n "Name Surname" -d 2018-01-01 -w "๐;๐;๐;๐;๐งก;๐;๐;๐;๐;๐ฅฐ"
Import Whatsapp
There is a convert-whatsapp.py to import a whatsapp exported Whatsapp Chat with Name.txt into a Telegram style json format.
To find the correct [Name Surname] take the name in the first line in the Whatsapp export txt.
However, the Whatsapp export is not as detailed as the Telegram export, so many numbers cannot be calculated.
./convert-whatsapp.py -i "Whatsapp Chat with Name.txt"
./telegram-statistics -i whatsapp-result.json -n "Name Surname"
Where "name" is the name displayed in Telegram (usually the surname).
Generated Files
The script generates multiple files.
emojis.txtcontains unicode encoded emojis and their countraw_metrics.jsonraw numerical data (contains all text of both persons / large file)
HTML Files (Plots):
plot_hours.htmlbokeh plot of message frequency over the hours of one dayplot_month.htmlbokeh plot of number of messages sent per monthplot_month_characters.htmlbokeh plot of characters sent per monthplot_weekdays.htmlbokeh plot of message frequency over one weekplot_month_calls.htmlbokeh plot of number of calls per monthplot_month_call_time.htmlbokeh plot of total seconds on call per monthplot_month_photos.htmlbokeh plot of number of photos sent per monthplot_month_replytime.htmlbokeh plot of average monthly replytime (Beta)plot_month_word_occurrence.htmlbokeh plot of combined substring occurences over time
Raw Files (one for each person):
raw_months_person_Person A.csvcsv vaues of month dataraw_weekdays_person_Person A.csvcsv vaues of weekday dataraw_months_chars_person_Person A.csvcsv vaues of monthly character count dataraw_monthly_pictures_person_Person A.csvcsv vaues of monthly picture count dataraw_monthly_calls_person_Person A.csvcsv vaues of monthly number of callsraw_monthly_call_duration_person_Person A.csvcsv values of monthly call durationraw_monthly_time_to_reply_person_Person A.csvcsv vaues of monthly reply time
Metrics
per chat
- total number of messages
- total number of words
- total number of characters
- count occurrence of each word
- number of unique words
per person
- total number of messages
- total number of words
- total number of characters
- average number of words per message
- average number of characters per message
- count occurrence of each word
- count occurrence of each emoji
- number of messages formated with markdown
- number of messages of type [animation, audio_file, sticker, video_message, voice_message]
- number of photos
- number of unique words
Requirements
python 3bokehnumpypandas
Contributing
I was inspired to do this project by a post on reddit.com/r/LongDistance
I would love to hear if you have made some statistics yourself. Feel free to message me on reddit
If you want to implement new metrics feel free to fork and send a pull request. Here are some things that I think could be improved or added:
- normalize weekly / hourly data to "average number" per day/hour instead of "total number"
- number of edited messages
Possible Issues
- csv separator is currently a semicolon
; - other country specific errors (eg. with dates)
License
MIT License
Copyright (c) 2018 Simon Burkhardt