fbchat-archive-parser
fbchat-archive-parser copied to clipboard
An application for parsing chat history from a Facebook data archive.
UPDATE April 28th 2018: Facebook recently revamped the "download your data" feature to a much more usable state. This was probably in compliance with GDPR laws by the European Union, which will be enforced starting in May 2018. Facebook now allows you to download your message data in JSON format, which supercedes the purpose of this project.
In light of that this repository will no longer be maintained.
Facebook Chat Archive Parser
|PyPI Version| |Python Versions| |Build Status|
A small tool and library for parsing chat history from a Facebook data archive into more usable formats.
What is a "Facebook Chat Archive"?
Facebook Messenger records all your conversation history since your account's creation. There are two options for history retrieval:
-
Create a scraper that constantly "scrolls up" in the conversation window you're interested in (or simulates that with API calls), progressively getting more of your chat history.
-
Ask Facebook for a zip archive of all your data
here <https://www.facebook.com/dyi>__ .
The second option is the only practical way to obtain everything in a timely manner.
What does Facebook give me in this zip archive?
The zip archive contains everything you've ever posted to Facebook,
including: pictures, videos, posts. etc along with chat messages.
Your chat history comes in a single HTML page titled ``messages.htm``.
Unfortunately, the data is unordered and impossible to load into
a web browser since it can be hundreds of megabytes. The only way to analyze
the content is through parsing the file.
**UPDATE:** As of October 2017, ``messages.htm`` just acts as a manifest
for the contents of a directory called ``messages/``. The formatting is
almost identical to before but with each thread in its own file now.
All files are required to use this tool.
Why would I ever want my Facebook chat history?
Here are some reasons you may want to parse your Facebook chat history:
- To make a simulation of your friends using
Markov chains <https://en.wikipedia.org/wiki/Markov_chain>__. - To keep a record of your conversations when deleting your Facebook account.
- To analyze a copy of your conversations for legal reasons.
Here comes the Facebook Chat Archive Parser!
The Facebook Chat Archive Parser is a command line tool (and library for
advanced users) used to easily transform your messages.htm file into
a useful format.
How do I get it?
Install the Facebook Chat Archive Parser via ``pip`` under
Python 2.7 or newer:
.. code:: bash
pip install fbchat-archive-parser
If you already have an older version installed, you can upgrade to the latest with the following command:
.. code:: bash
pip install --upgrade fbchat-archive-parser
How does it work?
Under the html/ folder simply run the command fbcap in your terminal with your
messages.htm file as the argument.
.. code:: bash
fbcap messages ./messages.htm
And watch as the parser sifts through your data!
.. figure:: https://i.imgur.com/HTChSxj.png :alt: Processing png
When it's done, your conversation history is dumped to
stdout. This can be very long. Here is an example:
.. figure:: http://i.imgur.com/ZgHjUST.png :alt: Results
What if I want JSON?
Simply supply the ``-f json`` option to the command line:
.. code:: bash
fbcap messages ./messages.htm -f json
Or if you want pretty formatted JSON:
.. code:: bash
fbcap messages ./messages.htm -f pretty-json
The output format is as follows (messages are ordered from oldest to newest).
.. code:: json
{
"threads": [
{
"participants": ["participant_0", "...", "participant_n"],
"messages": [
{
"date": "ISO 8601 formatted date",
"sender": "sender name",
"message": "message text"
},
"..."
]
},
"..."
]
}
How about CSV?
~~~~~~~~~~~~~~
Of course!
.. code:: bash
fbcap messages ./messages.htm -f csv
.. code:: text
thread,sender,date,message
Third User,Third User,2013-10-04T15:05Z,1
Third User,Third User,2013-10-04T15:05Z,2
Third User,Third User,2013-10-04T15:05Z,3
Third User,First User,2013-10-04T15:05Z,4
Third User,Third User,2013-10-04T15:06Z,5
Third User,First User,2013-10-04T15:07Z,6
Third User,First User,2013-10-04T15:07Z,7
Second User,Second User,2013-10-04T15:04Z,X Y Z
Second User,Second User,2013-10-04T15:05Z,X? Y Z!
Second User,Second User,2013-10-04T15:05Z,This is a test
Second User,Second User,2013-10-04T15:05Z,"Yes, it is"
Second User,Second User,2013-10-04T15:05Z,The last message!
"Second User, Third User",Third User,2013-10-04T15:05Z,1
"Second User, Third User",Third User,2013-10-04T15:05Z,2
...
What about YAML?
~~~~~~~~~~~~~~~~
For sure!
.. code:: bash
fbcap messages ./messages.htm -f yaml
.. code:: text
user: First User
threads:
- participants:
- Second User
- Third User
messages:
- date: 2013-10-04T22:05-07:00
message: '1'
sender: Third User
- date: 2013-10-04T22:05-07:00
message: '2'
sender: Third User
- date: 2013-10-04T22:05-07:00
message: '3'
sender: Third User
...
What if I want to see some statistics?
You can see many statistics regarding your Facebook chat history via the stats subcommand in many different formats.
.. code:: bash
fbcap stats ./messages.htm -f text
.. figure:: http://i.imgur.com/Dwzevxu.png :alt: stats image
See the --help menu for instructions on how to control what appears in the stats.
.. code:: text
$ fbcap stats --help
Usage: fbcap stats [OPTIONS] PATH
Analysis of Facebook chat history.
Options:
-f, --format [json|pretty-json|text|yaml]
Format to output stats as (default: text).
-c, --count-size INTEGER Number of most frequent words to include in
output (-1 for no limit / default 10)
-l, --length INTEGER Number threads to include in the output
[--fmt text only] (-1 for no limit / default
10)
-r, --resolve [BETA] Resolve profile IDs to names by
connecting to Facebook
-p, --noprogress Do not show progress output
-n, --nocolor Do not colorize output
-u, --utc Use UTC timestamps in the output
-z, --timezones TEXT Timezone disambiguators
(TZ=OFFSET,[TZ=OFFSET[...]])
--help Show this message and exit.
How do I get any of the above into a file?
Use standard file redirects.
.. code:: bash
fbcap messages ./messages.htm > my_file.txt
Can I get each conversation into a separate file?
Use the -d directive to send the output to a directory instead.
.. code:: bash
fbcap messages ./messages.htm -d some/random/directory
This will create a file per conversation titled thread_#.ext where # is the conversation number and ext is the extension of the format (e.g. json). A manifest.txt file is also created, which lists the participants in each thread number for navigational/search purposes.
What if I only want to parse out a specific conversation?
You can use the ``-t`` option to specify a particular conversation/thread you want to output. Just provide a comma-separated set of names. If you don't remember a last name (or the first name), the system will try to compensate.
.. code:: bash
fbcap messages ./messages.htm -t second
.. figure:: http://i.imgur.com/3FbWIN7.png
:alt: filter second
.. code:: bash
fbcap messages ./messages.htm -t second,third
.. figure:: http://i.imgur.com/IJzD1LE.png
:alt: filter second and third
What happens to my messages that are pictures?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As of January 2018, Facebook seems to be including referenced images in download archives. Image
messages will be converted to text references in the following format:
``(image reference: messages/photos/<picture id>.jpg)``
What else can I do?
===================
Take a look at the help options to find out more!
.. code:: text
$ fbcap messages --help
Usage: fbcap messages [OPTIONS] PATH
Conversion of Facebook chat history.
Options:
-f, --format [csv|json|pretty-json|text|yaml]
Format to convert to.
-t, --thread TEXT Only include threads involving exactly the
following comma-separated participants in
output (-t 'Billy,Steve Smith')
-d, --directory PATH Write all output as a file per thread into a
directory (subdirectory will be created)
-r, --resolve [BETA] Resolve profile IDs to names by
connecting to Facebook
-p, --noprogress Do not show progress output
-n, --nocolor Do not colorize output
-u, --utc Use UTC timestamps in the output
-z, --timezones TEXT Timezone disambiguators
(TZ=OFFSET,[TZ=OFFSET[...]])
--help Show this message and exit.
Troubleshooting
===============
Why do some names appear as <some number>@facebook.com?
-------------------------------------------------------
Facebook seems to randomly swap names for IDs. As of late, this seems to be much less of an issue. Nevertheless, if you are experiencing this issue, the parser can resolve the names via Facebook with the ``--resolve`` flag. Keep in mind, this is a beta feature and may not work perfectly.
.. code:: text
$ fbcap messages ./messages.htm -t second --resolve
Facebook username/email: facebook_username
Facebook password:
This requires your Facebook credentials to get accurate results. This is a direct connection between your computer and Facebook.
Your credentials are not relayed through any servers. Please look at the code if you are feeling paranoid or skeptical :)
Why are some of my chat threads missing?
----------------------------------------
This is a mysterious issue on Facebook's end. From anecdotal evidence, it seems that what gets returned in your chat archive is generally conversations with people who you have most recently talked to. Fortunately, it always seems to be the complete history for each conversation and nothing gets truncated.
As of late, it seems like Facebook has fixed this issue on their end and it is now far less of an issue.
Why are repeated names not showing?
-----------------------------------
Multiple users with equal names in group chats are shown as a single user. This has to do with Facebook's presentation of names in the message files, which doesn't make this distinction.
This cannot be remedied unless Facebook fixes the problem.
.. |PyPI Version| image:: https://badge.fury.io/py/fbchat_archive_parser.svg
:target: https://pypi.org/project/fbchat_archive_parser/
.. |Python Versions| image:: https://img.shields.io/pypi/pyversions/fbchat-archive-parser.svg
:target: https://github.com/ownaginatious/fbchat-archive-parser/blob/master/setup.py
.. |Build Status| image:: https://travis-ci.org/ownaginatious/fbchat-archive-parser.svg?branch=master
:target: https://travis-ci.org/ownaginatious/fbchat-archive-parser