bayesian-echo-chamber
bayesian-echo-chamber copied to clipboard
Bayesian generative model for conversation data
Bayesian Echo Chamber
A new Bayesian generative model for social interaction data, for uncovering influence relations from time-stamped conversation data.

Please refer to
Fangjian Guo, Charles Blundell, Hanna Wallach and Katherine A. Heller. The Bayesian Echo Chamber: modeling social influence via linguistic accommodation. AISTATS 2015, San Diego, CA, USA. JMLR: W&CP volume 38.
for details of the model.
Files
├── data a collection of datasets
├── results results are produced here
│ ├── 12-angry-men-analytics.Rmd generating report from result
│ └── Makefile compile an html report from R markdown
├── src
│ ├── bec.py main "Bayesian echo chamber" class
│ ├── bec_sampler.py a wrapper of the sampler of bec
│ ├── hawkes.py an implementation of Hawkes process
│ ├── likelihoods.py several likelihoods
│ ├── run_bec_12angrymen.py a demo script producing result for data/12-angry-men
│ ├── slice_sampler.py slice sampler
│ ├── talkbankXMLparse.py parser for talkbank xml format
└── stopwords
└── english.stop list of stop words in English
Usage
- Run
python run_bec_12angrymen.pywould produce samples and other auxiliary files underresults/12-angry-men/. One could customize scripts based onrun_bec_12angrymen.pyfor other datasets and configurations. - Run
makeunderresults/could produce an html report compiled from R Markdown file12-angry-men-analytics.Rmd. One could customize theRmdfile for analyzing other datasets.
Dependencies
- Python modules (tested under Python 2.7)
- numpy, scipy
- matplotlib
- nltk for word stemming in
talkbankXMLparse.py
- R libraries for generating report
- knitr
- ggplot2
- coda
- plyr
- qgraph
- pander
Datasets
The conversation data is read from the TalkBank xml format. A conversation consists of several utterances, with each utterance described with the following entities: speaker, content, start time and end time, which looks like the snippet below.
<u who="Juror 7" uID="#7">
<w>So</w>
<w>how</w>
<w>come</w>
<w>you</w>
<w>vote</w>
<w>not</w>
<w>guilty</w>
<media start="47.4640" end="49.3820" unit="s"/>
</u>
Currently, we have prepared the following datasets under data/ directory.
- 12 Angry Men: transcribed from the 1957 movie subtitle.
- SCOTUS: oral arguments from 50 years of the United States Supreme Court, obtained from TalkBank.
- synthetic: a synthetic example with 3 agents speaking with a vocabulary of 20, with time stamps generated from a Hawkes process and contents generated from the BEC model.
Authors
This repo is maintained by Richard Guo. We also acknowledge the earlier contribution of Juston Moore to bec.py, likelihoods.py and slice_sampler.py.