arules icon indicating copy to clipboard operation
arules copied to clipboard

Mining Association Rules and Frequent Itemsets with R

R package arules - Mining Association Rules and Frequent Itemsets

CRAN
version stream r-universe
status CRAN RStudio mirror
downloads

The arules package for R provides the infrastructure for representing, manipulating and analyzing transaction data and patterns using frequent itemsets and association rules. The package also provides a wide range of interest measures and mining algorithms including the code of Christian Borgelt’s popular and efficient C implementations of the association mining algorithms Apriori and Eclat. In addition, the following mining algorithms are available via fim4r:

  • Apriori
  • Eclat
  • Carpenter
  • FPgrowth
  • IsTa
  • RElim
  • SaM

Code examples can be found in Chapter 5 of the web book R Companion for Introduction to Data Mining.

arules core packages:

  • arules: arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures.
  • arulesViz: Visualization of association rules.
  • arulesCBA: Classification algorithms based on association rules (includes CBA).
  • arulesSequences: Mining frequent sequences (cSPADE).

Other related packages:

Additional mining algorithms

  • arulesNBMiner: Mining NB-frequent itemsets and NB-precise rules.
  • fim4r: Provides fast implementations for several mining algorithms. An interface function called fim4r() is provided in arules.
  • opusminer: OPUS Miner algorithm for finding the op k productive, non-redundant itemsets. Call opus() with format = 'itemsets'.
  • RKEEL: Interface to KEEL’s association rule mining algorithm.
  • RSarules: Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.

In-database analytics

  • ibmdbR: IBM in-database analytics for R can calculate association rules from a database table.
  • rfml: Mine frequent itemsets or association rules using a MarkLogic server.

Interface

  • rattle: Provides a graphical user interface for association rule mining.
  • pmml: Generates PMML (predictive model markup language) for association rules.

Classification

  • arc: Alternative CBA implementation.
  • inTrees: Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
  • rCBA: Alternative CBA implementation.
  • qCBA: Quantitative Classification by Association Rules.
  • sblr: Scalable Bayesian rule lists algorithm for classification.

Outlier Detection

Recommendation/Prediction

  • recommenerlab: Supports creating predictions using association rules.

Installation

Stable CRAN version: Install from within R with

install.packages("arules")

Current development version: Install from r-universe.

install.packages("arules", repos = "https://mhahsler.r-universe.dev")

Usage

Load package and mine some association rules.

library("arules")
data("IncomeESL")

trans <- transactions(IncomeESL)
trans
## transactions in sparse format with
##  8993 transactions (rows) and
##  84 items (columns)
rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 899 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[84 item(s), 8993 transaction(s)] done [0.01s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [457 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

Inspect the rules with the highest lift.

inspect(head(rules, n = 3, by = "lift"))
##     lhs                           rhs                      support confidence coverage lift count
## [1] {dual incomes=no,                                                                            
##      householder status=own}   => {marital status=married}    0.10       0.97     0.10  2.6   914
## [2] {years in bay area=>10,                                                                      
##      dual incomes=yes,                                                                           
##      type of home=house}       => {marital status=married}    0.10       0.96     0.10  2.6   902
## [3] {dual incomes=yes,                                                                           
##      householder status=own,                                                                     
##      type of home=house,                                                                         
##      language in home=english} => {marital status=married}    0.11       0.96     0.11  2.6   988

Using arules with tidyverse

arules works seamlessly with tidyverse. For example:

  • dplyr can be used for cleaning and preparing the transactions.
  • transaction() and other functions accept tibble as input.
  • Functions in arules can be used with %>%.
  • arulesViz provides visualizations based on ggplot2.

For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.

library("tidyverse")
library("arules")
data("IncomeESL")

trans <- IncomeESL %>%
    select(-`ethnic classification`) %>%
    transactions()
rules <- trans %>%
    apriori(supp = 0.1, conf = 0.9, target = "rules", control = list(verbose = FALSE))
rules %>%
    head(n = 3, by = "lift") %>%
    inspect()
##     lhs                           rhs                      support confidence coverage lift count
## [1] {dual incomes=no,                                                                            
##      householder status=own}   => {marital status=married}    0.10       0.97     0.10  2.6   914
## [2] {years in bay area=>10,                                                                      
##      dual incomes=yes,                                                                           
##      type of home=house}       => {marital status=married}    0.10       0.96     0.10  2.6   902
## [3] {dual incomes=yes,                                                                           
##      householder status=own,                                                                     
##      type of home=house,                                                                         
##      language in home=english} => {marital status=married}    0.11       0.96     0.11  2.6   988

Using arules from Python

See Getting started with arules using Python.

Support

Please report bugs here on GitHub. Questions should be posted on stackoverflow and tagged with arules.

References