Clustering motif models to remove redundancy

Non-redundant TF motif matches genome-wide

Below describes the general workflow for clustering motif models to remove redundancy, generating an "archetype" motif, and then finally, performing genome-wide scans of motifs and remvoval of redundancy. Please note that this documentation is incomplete, as should be used a a rough guide only.

If you are looking for the final results (motif clusters, genome-wide scans and a browser shot) please see the following website:

Note: The above link is for version 1.0 of the motif clusters. While we have yet to update the website, version 2.1-beta (latest) can be found at

  • Python 3.5+
    • numpy
    • scipy
    • seaborn
    • matplotlib
    • genome-tools (
  • TOMTOM (
  • BEDOPS (


  • Version 2.0beta-human (

    • CIS-BP (homo sapien; only motifs w/ direct measurement)
    • JASPAR2022
    • BANP from Grand 2021
    • 5,193 total motif model; 693 distinct clusters
  • Version 1.0 (complete documation:

    • Jolma et al., Cell 2013 (Supplemental Table 2)
    • JASPAR 2018
    • HOCOMOCO version 11 (757 motif models; both human and mouse)

Quick start

Step 1: Download and preproces motifs

See runall script in each motif database directory (databases/*)

Step 1: Compute pair-wise motif similarity

Here we use TOMTOM to determine the similarity between all motif models (all pairwise) with the following code:

meme2meme databases/*/*.meme > tomtom/

tomtom \
	-bfile /net/seq/data/projects/motifs/hg19.K36.mappable_only.5-order.markov \
	-dist kullback \
	-motif-pseudo 0.1 \
	-text \
	-min-overlap 1 \
	tomtom/ tomtom/ \
> tomtom/tomtom.all.txt

I have a provided a script that will load this operation up on a SLURM parallel compute cluster (see bin/runall.tomtom.v2.0beta-human for an example)

Step 2: Hierarchically cluster motifs

After running TOMTOM, open up the provided Jupyter Notebook to perform the clustering and visualization

We perform hierarchical clustering (distance: correlation, complete linkage) from the TOMTOM similarity E-values. Below is a heatmap representation of motifs clustered by simililarity and clusters identified cutting the dendrogram at height 0.7.

Clustered heatmap cut at height 0.7

Step 3: Process each cluster to build a motif archetype

Again, inside the notebook there is code that will process and visualize each motif cluster.

AC0002 (homeodomain) AC0240 (CCAAT-box)
AC0002 AC0240

Step 4: Make HTML output

Run the BASH script bin/runall.make-html to generate an HTML webpage (index.html) in the results directory

Step 5: Scan genome using all motif models (individually and archetype)

I use the software package MOODS to find motif matches genome-wide. Its a great tool and that I highly reccomend. See bin/runall.scan_models for an example of how to do this on a SLURM cluster.

Step 5: Create working and browser tracks

To create a bigBed file from a bed9+4, we need to include an AutoSql file (

table hg38_motifs_collapsed
"Collapsed motifs matches in hg38 (see:"
string  chrom;        "Reference sequence chromosome or scaffold"
uint    chromStart;    "Start position of feature on chromosome"
uint    chromEnd;    "End position of feature on chromosome"
string  name;        "Name of motif"
uint    score;        "Score"
char[1] strand;        "+ or - for strand"
uint    thickStart;    "Coding region start"
uint    thickEnd;    "Coding region end"
uint      reserved;    "itemRgb"

Make the tracks for the archetypes

bedToBigBed -type=bed9+4 -tab moods.combined.all.bed chrom.sizes
awk -v OFS="\t" '{ print $1, $2, $3, $4, $11, $6, $10, $13}' moods.combined.all.bed | bgzip -c > moods.combined.all.bed.gz
tabix -p bed moods.combined.all.bed.gz

Make the tracks for the full motif scans.

fetchChromSizes hg38 > /tmp/chrom.sizes
awk -v OFS="\t" '{ print $1, 0, $2; }' /tmp/chrom.sizes | sort-bed - > /tmp/chrom.sizes.bed
bedops -e 100% moods.combined.all.bed /tmp/chrom.sizes.bed \
| awk -v OFS="\t" '{ print $1, $2, $3, $4, 0, $6, $2, $3, "0,0,0", $5, $7 }' > /tmp/moods
bedToBigBed -type=bed9+2 -tab /tmp/moods chrom.sizes