Frequent Patterns and Association Rules

Frequent pattern is a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It used for finding inherent regularities in data.

Ex. What products were often purchased together? - Beer and diapers (fact) ?!

Because blue collars usually buy beers on Friday night after work. At that time, their wives will ask them buy some diapers by the way.

Frequent Patterns and Association Rules

Itemset $X = \{x_1, …, x_k\}$

Find all the rules X -> Y with minimum support and confidence

Support s, probability that a transaction contains $X \cup Y$ . Support define the rule is popular or not.
Confidence c, conditional probability that a transaction having X also contains Y

Example:

Transaction-id	Items bought
10	A, B, D
20	A, C, D
30	A, D, E
40	B, E, F
50	B, C, D, E, F

Frequent Patterns: { A:3, B:3, D:4, D:4, E:3, AD:3 }

We Define $sup_{min}=50\%, conf_{min}=50\%$

Association rules:

A -> D: ( s = 3/5, c = 3/3 ) = ( 60% , 100% )
D -> A: ( s = 3/5, c = 3/4 ) = ( 60% , 75% )
Therefore, A and D is strongly associated.

The downward closure property of frequent patterns:
Any subset of a frequent itemset must be frequent

major scalable mining methods

Apriori Algorithm (Agrawal & Srikant, 1994)
Freq. pattern growth (Han, Pei & Yin, 2000)
Vertical data format approach (Zaki & Gouda, 2003)

Siling

Siling

Frequent Patterns and Association Rules