Association rule
- Rule Evaluation Metrics
- Applications of Association Rule Learning
- Advantages of Association Rule Mining
- Disadvantages of Association Rule Mining
Association rule learning falls under the realm of unsupervised learning and primarily aims to uncover meaningful connections or associations between items in a dataset. The core objective is to identify interesting relations among various variables within a database. These association rules in machine learning essentially depict the frequency of occurrence of a chosen item or set of items within a transaction.
Market basket analysis is a well-known application of association rule mining, therefore, it and apriori algorithms are also known as algorithms for market basket analysis. This technique explores relationships among products typically bought together. Consider a trip to a supermarket where items that tend to be purchased simultaneously are strategically placed close to each other. This arrangement is based on observed purchasing patterns; it's aimed at potentially encouraging additional purchases when customers pick up one item, suggesting that they might be interested in related products.
The significance of association rule mining aka ais algorithm in data mining extends beyond retail settings. It finds applications in diverse fields like web usage mining, continuous production, and more. Essentially, it's utilized to unveil correlations and patterns in various datasets, enabling businesses and analysts to make informed decisions based on observed associations among items or variables.
Real-World Example for Association Rule
Let’s look at a
real-world example, Emily owned a grocery store, and she wanted to know her customers’
buying behaviors better. She has transaction data; therefore, she uses
association rule mining to uncover the insights of the data.
By applying the apriori algorithm association rule mining, she discovered that customers purchasing organic
vegetables often buy organic fruits in their carts, which suggests a preference
for healthy, natural foods. Similarly, those buying pasta sauce were likely to
purchase pasta, indicating a taste for Italian cuisine.
With these insights,
Emily strategically arranged her store layout, placing complementary items like
bread and cheese closer together to encourage additional purchases. She also
designed targeted promotions, offering discounts on items frequently bought
together such as chips and soda or coffee and pastries, to entice customers to
buy more.
With the help of
association rule mining, Emily was able to transform her grocery store into a
thriving hub of community activity.
Types of Association Rule Algorithm
- Apriori
- Eclat
- F-P Growth Algorithm
The Apriori algorithm operates on transactional databases to derive association rules from frequent datasets. It efficiently computes item sets through breadth-first search and Hash trees. Its primary application lies in market basket analysis, although it's also applicable in fields like healthcare for discerning drug reactions among patients.
Eclat, short for Equivalence Class Transformation, utilizes depth-first search to uncover frequent item sets in transaction databases. Notably, it offers faster performance compared to the Apriori algorithm.
The F-P Growth algorithm, abbreviated from "frequent pattern," represents an enhanced version of the Apriori algorithm. It structures the database into a tree-like format known as a frequent pattern tree, to identify the most common patterns present in the data.
Working of Association rule or algorithms for association rule mining
First, we need to know the basic definitions before defining the rule.
Support count (σ) - represents how often a certain set of objects appears.
Frequent item – The items in this set have a minimum support level of 1.
Association rule – a two-item set X and a two-item set Y expressed as an implication expression.
Rule Evaluation Metrics
"Support" is the term used to describe the proportion of completed transactions that include elements from both the {X} and {Y} rule segments. This metric gauges the frequency of occurrence of a set of items together as a percentage of the entire transaction volume.
Support (σ)=(X+Y)/total
This refers to the proportion of transactions that include both items X and Y together.
Confidence (C) – The degree of confidence quantifies the strength of the relationship between two components in a dataset. This is below the equation: The ratio of all transactions including all items in set {X} to all transactions including all items in set {Y} is the planned distribution.
Conf(X=>Y)Supp(X∪Y)/Supp(X)
The expression mentioned evaluates the frequency of occurrence of each item in Y within transactions that also include items from set X.
Lift (L) – Lift acts as a metric indicating the degree of association between two items, taking into account their frequencies in the dataset. Assuming that X and Y are separate collections of items, we divide the rule's confidence by the projected confidence to find lift in the context of the rule X => Y. This expected confidence is derived by dividing the confidence by the frequency of {Y}.
lift(X=>Y)=Conf(X=>Y)/(Supp(Y))
When the lift value is around 1, it suggests that X and Y generally appear together as expected. A lift greater than 1 signifies that their co-occurrence is more frequent than anticipated, while a value less than 1 indicates a lower-than-expected co-occurrence. Higher lift values denote a stronger association between the items.
Benefits of AR Mining Associations
Association rules are simple and easy to grasp, allowing non-technical people to gain useful insights even when faced with complex problems.
Retail, e-commerce, and other industries can benefit from its rapid discovery of co-occurring products or events, which in turn helps with suggestions and cross-selling techniques.
Helps in decision-making by giving data on product correlations, which in turn facilitates targeted marketing and better inventory management.
Big data settings can benefit from association rule mining due to the scalability of algorithms such as Apriori and FP-Growth, which can manage enormous datasets.
Association rules are simple and easy to grasp, allowing non-technical people to gain useful insights even when faced with complex problems.
Retail, e-commerce, and other industries can benefit from its rapid discovery of co-occurring products or events, which in turn helps with suggestions and cross-selling techniques.
Helps in decision-making by giving data on product correlations, which in turn facilitates targeted marketing and better inventory management.
Big data settings can benefit from association rule mining due to the scalability of algorithms such as Apriori and FP-Growth, which can manage enormous datasets.
Application flexibility: it provides insights into many sorts of relationships within data sets and is relevant in diverse industries such as healthcare, finance, telecommunications, and more.
The basis for further analysis: association rule mining serves as a foundation for more complex data mining techniques and can aid in feature selection or dimensionality reduction for machine learning tasks.
Extraction of actionable patterns: enables the extraction of actionable patterns, helps businesses optimize processes, improve sales strategies, or enhance customer experiences.
Disadvantages of Association Rule Mining
Complexity in computing: The computational cost of creating rules for massive datasets may become evident when dealing with a high number of objects or transactions. Its complexity could limit its ability to grow.
Generation of numerous rules: the algorithm can produce a vast number of rules, including many that might be trivial, redundant, or not actionable. Sorting through this volume to find meaningful associations can be challenging.
Handling of noise and spurious correlations: association rule mining might pick up spurious correlations or associations caused by noise or rare events, leading to unreliable or misleading rules.
Dependency on threshold settings: the quality and relevance of the rules heavily depend on the thresholds set for support and confidence. Determining these thresholds can be subjective and may impact the usefulness of the discovered rules.
Inability to handle continuous variables: association rule mining typically works with categorical or binary data, making it less suitable for continuous variables without preprocessing.
Limited to binary relationships: it primarily discovers binary associations between items or features, potentially missing more complex relationships or interactions between multiple variables.
Assumptions of independence: association rule mining assumes independence between items, which might not hold in all cases, particularly when dealing with sequential or time-related data.
Contextual information exclusion: it might not consider contextual information or temporal relationships between items, limiting its applicability in certain scenarios.
Summary
Association rule mining is a powerful data mining method employed to unveil concealed patterns, connections, and associations within extensive datasets. It delves into relationships and correlations between different items or variables, finding applications in diverse fields such as market basket analysis, recommendation systems, and numerous other domains. The method generates rules (that is if-then statements) that show the co-occurrence or dependency between items in the data. While it offers insights into item associations and supports decision-making processes, it has limitations such as computational complexity, generation of numbers rules, susceptibility to noise, and assumptions of independence between items. Despite these limitations, association rule mining remains valuable for uncovering valuable associations within datasets, aiding in business strategies, and providing insights into consumer behavior.
Python Code
below is the association/Apriori algorithm code in Python: -