Deep Learning-Based Market Basket Analysis Using Association Rules

Market Basket Analysis (MBA) is a data mining technique assisting retailers in determining the customer's buying habits while making new marketing decisions as the buyer's desire frequently changes with expanding needs; therefore, transactional data is getting large every day. There is a demand to implement Deep Learning (DL) methods to manipulate this rapidly growing data. In previous research, many authors conducted MBA applying DL and association rules (AR) on retail datasets. AR identifies the association between items to find in which order the customer place items in the basket. AR is only used in mining frequently purchased items from retail datasets. There is a gap in classifying these rules and predicting the next basket item using DL on the transactional dataset. This work proposes a framework using AR as a feature selection while applying DL methods for classification and prediction. The experiments were conducted on two datasets, InstaCart and real-life data from Bites Bakers, which operates as a growing store with three branches and 2233 products. The AR classified at 80,20 and 70,30 splits using CNNN, Bi-LSTM, and CNN-BiLSTM. The results considering simulation at both splits show that Bi-LSTM performs with high accuracy, around 0.92 on the InstaCart dataset. In contrast, CNN-BiLSTM performs best at an accuracy of around 0.77 on Bites Bakers dataset.


Introduction
To meet consumers' expanding needs and compete in the market, retailers are interested in customer purchasing patterns [1].The ability to choose different versions of the same product based on consumer preferences has been made possible by technological advancements, which have increased the frequency of product updates [2].Due to the growing customer demand as more products are introduced to meet changing needs, the dataset is growing in size daily [3].Hence, customers buy different items in a single store visit.Due to the large volume of products and customers, the traditional method takes more time to find purchasing behavior.So data mining methods must extract useful knowledge from data [4], [5].Knowledge Data Discovery (KDD) is the series of iterative steps to discover beneficial, salient, and comprehend patterns from massive datasets [6].Data mining is the crux of KDD, including techniques to understand and explore data, augmenting meticulous models, and finding crucial patterns in extracting knowledge [7].Knowledge discovered from the KDD process helps identify frequent patterns in the database.
Frequent pattern mining is applied to relational, transactional, and time series databases [8].
Frequent pattern mining is active in market basket analysis, bioinformatics, crime prevention, educational mining, and forensic analysis.Frequent pattern mining involves association, clustering, classification, neural networks, and other data facts [9].The retail industry is always interested in customers' buying behavior to increase profit and plan new marketing strategiesmining of frequent items in retail known as affinity analysis or market basket analysis [10].MBA is an effective technique to find the most frequent patterns for retailers.MBA identifies purchasing behavior using a transactional dataset [11].
Transactional data is analyzed using different data mining methods for MBA; One data mining technique to identify recurring retailer patterns is AR [12].Transaction ids and a list of items that have been purchased make up the transactional dataset.One item's association with another item is found in the transactional dataset [13].Data exploration is done, and preprocessing methods are used to clean and normalize the data.After cleaning, an AR mining algorithm was used to produce frequently used itemsets, and AR was created to discover consumer purchasing habits [14], [15].
Due to the massive data size, researchers are focusing on DL methods to propose new online and offline shopping experiences as retail grows rapidly [16].Using offline and online retail datasets, DL applications in MBA include next-basket recommendations, churn prediction, customer segmentation, and sales prediction.DL methods, known as artificial neural networks, are based on the structure of the human brain [17], [18].These neural networks don't need to interact with people to learn from unlabeled data.The neural network is made up of layers, a learning algorithm, and an activation function.Data only moves in one direction in feed-forward neural networks, from the input node to the hidden node and output node.In contrast, the Recurrent neural network operates in both directions [19].
The transactional data is getting large as customer demand is increasing.Finding customer purchase patterns and predicting the next item is a crucial research problem so retailers can upgrade their business strategies accordingly [20].There is a need to blend AR with DL methods to make predictions and handle rapidly growing transactional datasets.This paper addresses this gap by using AR as input in the DL method to predict the next product category.This paper proposes a novel market basket analysis framework to predict the next product category using AR as a feature selection.The objective of this paper is to 1) Propose a novel model to predict the next product category on two datasets, InstaCart and Bites Bakers, 2) Extraction of AR using Eclat and FP-Growth on InstaCart and Bites Bakers datasets, 3) Classification and prediction using the extracted AR by implementing CNN, Bi-LSTM and CNN-BiLSTM on InstaCart and real-life datasets.
In the first phase of the Model, AR is used to find frequent items, while DL methods are used to predict the next product category purchased by the customer.

Literature Review
In the past, many researchers conducted MBA using various data mining techniques.A detailed review is published related to the research problem discussed in this paper (Rehman and Ghous, 2021).A brief review of previous research related to MBA using AR and deep learning methods on offline and online retail datasets delineated as follows: In 2023, Apriori and FP-Growth algorithms were used for market basket analysis with realworld data.The performance of these algorithms was assessed using multilevel association rules mining.According to numerical research results, FP-Growth performs better than Apriori at all levels of product groups in terms of run time and memory [21].In 2022, a novel next-group recommendation approach based on sequential market basket information was proposed.The proposed method was based on the upper similarity approximation clustering, Borda majority count, and PrefixSpan algorithms.With the proposed approach, customers may have more opportunities to increase their purchases by receiving recommendations for the next group rather than the next item [22].Apriori algorithms were used in 2022 to gather data on the relationships between sales patterns from records of customer shopping cart transactions, specifically sales of vehicle parts and transactions for vehicle repair services.Interviews and documentation were used to collect data.The criteria used in this study were a minimum of 20 transactions with a frequent item set of 1.7%, a confidence value of 40%, and a lift ratio value greater than 1.The study's findings produced nine sales pattern relationships with 100% certainty [23].
By establishing association rules in 2022, the Apriori algorithm was used.It entails identifying groups of items in a market basket whose combination could result in greater economic benefits for businesses.This study aimed to analyze historical sales data from product groups to identify relationships that would allow companies in the sector to generate patterns to propose portfolio expansion based on the products with the highest purchasing trends [24].In 2022, a method was proposed to analyze buyer behavior patterns when purchasing an item simultaneously.The FP-Growth algorithm was chosen because it is more efficient and faster.The result of the research is a system that can quantify the value of product associations in transactional data.Five hundred and seventy-one transaction data items were tested, and four rules with lift ratio test values greater than one were discovered [25].A study was conducted in Pithoragarh (Uttarakhand) city in 2022 in various retail stores such as Bachat Store, Vishal Mart, Buy Chance, Pithoragarh Army Canteen, and so on.he transaction information was gathered from 45 of these stores' customers.RStudio was used to implement the Apriori algorithm.Based on the literature, the support and confidence values were set at 20% and 80%, respectively.The analysis yielded three association rules.All three association rules showing a positive relationship between antecedents and consequents had lift values greater than 1, which was discovered [26].
MBA using Association Rules was held in 2021.The study relied on sales data from any Vancouver Island University website supermarket.A data set containing 225 different products was used to analyze the Weka program data.Apriori and FP Growth were chosen for assessment.Because the data set is definite, the Apriori algorithm produced no results.As a result, the FP Growth algorithm was used, and the top ten rules were assigned based on the conviction value [27].
In 2020, an Apriori algorithm was used for a frequent item set of sales transactional data of Maharani Supermarket using temporal AR.The items purchased at certain intervals during the fasting month, Christmas, and new year are recommended for product layout at the supermarket.The best rule is milk with a snack for 12, 6, and 3 months at a minimum support of 0.07 and confidence of 0.3.Seven rules were recommended as product layouts appeared at Christmas 2017 and 2018 for next new year's upcoming events [28].In 2020, the Eclat algorithm was used on a transactional dataset of 212 Mart to find the most frequently bought products in different time intervals.The dataset is divided into quarters in Quarter 1 from February 21, 2018, to May 2018, with 13261 transactions; the bestselling item is Indomie goreng 86gr with 391 pieces.In Quarter 2, from May 22, 2018, to August 2018, with 11978 transactions, the bestselling item is alpha one 600ml with a quantity of 355 pieces.In Quarter 3, from August 22, 2018, to November 2018, the best-selling item is a 60ml milo stick with a quantity of 268 pieces [29].
After reviewing previous research, many authors conducted an MBA using AR to find customer purchase habits.The MBA helps the retailer to find keystone products that recognize in the market and could harm the business if these products are not available or costly.But as data is increasing continuously due to the growing customer demands, it is difficult to find customer purchase behavior by only extracting AR.So it is obvious that AR and DL methods need to be combined to predict and classify these rules, as AR is only used to find customer purchase behavior (Basysyar et al., 2021).There is a lack of using AR as a feature selection with DL methods.

Methodology
Analysis of customer purchase behavior is essential to acknowledge the shoppers about the intention of buyers.It helps retailers design business strategies while knowing consumer purchase behavior and success for new product launches [30].The change in consumer buying habits will help to predict the market trend.Prediction of the next item to buy is an important research problem of an MBA [31].In the literature review, this problem is solved using different DL methods.The AR is only used to find buying patterns of customers.There is a gap in using AR as a feature selection to predict the next product category.This paper presents a model using AR as a feature selection and DL to predict the next product category on grocery datasets-the framework implemented on two datasets, InstaCart, and real-life data obtained from a grocery store.Nowadays, DL has emerged as a prominent and emerging research area in a variety of fields.In contrast to conventional ML, DL enables multi-layer computation models to learn representations of data by processing them in their raw form [32]. DL Methods were adopted for this research.

Dataset
InstaCart is an online grocery store dataset [33] available on kaggle.com.This dataset contains information on aisles, departments, products, and order products.The methods used to implement the proposed framework are described:

Association Rules:
Market basket analysis is applied to find customer purchase behavior in retail.AR is commonly used to analyze transactional datasets to find the relationship between items purchased by customers.

Figure 1: Bites Special Products
Market basket analysis is applied to find customer purchase behavior in retail.Association rules are commonly used to analyze transactional datasets to find relationships between items purchased by customers.A transactional dataset is given in the following table.TID is the transaction id of each item.The item's column contains a list of goods purchased by the customer.

FP-Growth:
FP-Growth is an AR mining algorithm for creating frequent item sets in the transactional dataset [34].It works in two steps The FP-Growth is a fast and memory-efficient algorithm because it generates frequent itemsets in just two data scans [35].Therefore, it works fast than the Apriori Algorithm.

Eclat:
Eclat (Equivalence class transformation) is a method to find frequent item sets in the transactional dataset [36].Zaki suggested it in 2001 [37].It generates frequent itemsets using the depth-first approach.It works on vertical data layout using intersection to calculate the support of itemset [38].
• Generate transaction id (tid) list of each item The intersection of an item tid list with a tidy list of all other subsets of that item • Repeat this process for all items Eclat is fast and consumes less memory due to the depth-first approach [39].It scans data only once compared to the Apriori algorithm, which scans original data repeatedly.

Convolutional Neural Networks (CNN):
CNN is a deep artificial neural network to solve image and text classification research problems.A simple CNN structure consists of convolution, max pooling, and fully connected layers [40].
The work of the convolution layer is to fetch the features of data.In the convolution procedure, the kernel slides on input data, so the kernel weights are multiplied by the corresponding vector values.The next pooling layer extracts the original features decreasing the training parameters and ratio of overfitting.Pooling operations are divided into two types: max-pooling and meanpooling.Max pooling selects the maximum value in a corresponding convoluted vector.

Long Short-Term Memory (LSTM):
The Long Short-Term (LSTM) neural network is a variation of RNN that defeats the Vanishing Gradient problem faced in traditional RNN [41].[42].It can also be used to grasp the timerelated dependency from the dataset.LSTMs uses internal memory unit to learn from previous observations to make a prediction.The basic structure of LSTMs consists of three gates: input, output, and forget gates [43].The flow of training information is controlled across these gates by inserting information in the input gate, removing unnecessary information in forget gate, or passing it to the output gate.The input, output, and forget I represent gates o and f, respectively.The cell state is defined with C, and the cell output is represented by h, while the cell input is given by x.The equations to calculate gates and states are shown below.
The weights of gates represented by w and C is the revised cell state.These weights are upgraded using backpropagation for a long time.The forget gate decreases over-fitting by not permitting whole information from the prior time instant.In traditional LSTM, the input of the hidden layer relies on the computing of the cell controlling the data at the earlier time interval.On the other hand, in Bidirectional LSTM, information flows in two ways: one forward and the other in the opposite direction.
Based on the above methods, the proposed framework is implemented with two datasets, one from previous research and the real-life data of a grocery store, which will be preprocessed into transaction form to generate one hot encoded basket of purchased items.The AR will be extracted using Eclat and FP-Growth algorithms on two datasets.These rules will be converted into vector form to feed into deep learning methods.The extracted rules will be split into (80,20) and (70,30) ratios of training and testing.The experiments on each split will be conducted using CNN, Bi-LSTM, and CNN-BiLSTM methods on two datasets by parameter tuning.The simulation of experiments and results will be mentioned in tabular form with performance metrics

Results and Discussion
The AR is only used to find customer purchase patterns [44].Deep learning methods must be implemented to make these rules' predictions and classifications.This section describes the simulation of the proposed Model using AR and DL methods-the experiments conducted on two datasets, InstaCart and real-life data from Bites Bakers grocery store.The AR was extracted using Eclat and FP-Growth algorithms on two datasets.These rules are split into (80,20) and (70,30) ratios to simulate experiments.The experiments are conducted using PyCharm and Rstudio while writing scripts in Python and R programming languages.

Preprocessing:
The AR algorithm takes data in a sparse matrix, a read.Transactions function applied to convert data into transactions.The transaction data summary shows the sparse matrix size and frequency of frequent items with corresponding product IDs.Unique product labels are assigned to get the corresponding name for product IDs.

Deployment of Eclat
Now the preprocessed data is ready to select features using AR.The frequent items of a maximum length of 15 with support of 0.001 were inspected by implementing the Eclat algorithm to generate these rules.The frequent item set was used to get 347 AR at confidence 0.3, from which 65 is a length 2 (LHS is one item and RHS is one item), 267 is a three dimensional (LHS is two products) and 15 of size 4 (LHS is three products) as shown in  The scatter plot of the 347 rule, as shown in Figure .5,depicts that the rules with high lift have low support.

Figure 5: Scatter Plot -Association Rules with High Lift Value
The AR is only used to find customer buying patterns [45].There is no use of these rules to predict and classify future business needs.In this work, the experiments are conducted to classify and predict 347 rules using CNN and Bi-LSTM.The AR is converted into vector form by making a data frame of unique products on rows and association rules on columns.The class labels are assigned for multi-categorical classification for the cross-ponding products in association rules marked as 1s and the remaining products with 0s.These rules passed to the CNN method consisting of Conv1D layer with activation function' relax, GlobalMaxPool1D layer and two Dense layers with activation function' relax, and the last Dense classification layer used activation function' softmax' to perform multi-classification.The experiments were conducted at epochs five and batch-size 10 using the 'Adam' optimizer by splitting data into (80,20) and (70,30) ratios listed in Table 1.The second method, Bi-LSTM, is implemented with the SpatilaDropout1D layer, Bi-LSTM with dropout 0.2, and the Dense classification layer using activation function' softmax' for multi-categorical classification.The experiment was performed at epochs 50 and batch_size 10 using the 'Adam' optimizer by dividing data into (80, 20) and (70, 30) ratios listed in Table 2.The best results considering simulation at both splits show that Bi-LSTM performs with a high accuracy rate on the InstaCart dataset.The prediction label shows that the customer will buy yogurt, milk, fruits, vegetables, egg, and bread in their next purchase 4.2 Experiment using Bites Bakers:

Preprocessing:
The Sales Detail Report (SDR) is processed by omitting attributes that are not required, such as date, price, time, and total amount.The required data from SDR was extracted by applying Excel formulas-the unique products with product names extracted and stored in the products file.The unique product IDs are assigned to each product, as shown in Figure .6.
The preprocessing steps performed on the Bites Bakers dataset are grouping transactions based on product id and Sale No, merging the top 100 special Bites product's name on product id as shown in Figure .7

Implementation of CNN-Bi-LSTM
The rules extracted after implementing FP-Growth must be converted into vector form, as discussed in section 4.1.4,to feed deep learning methods.These rules are converted into vector form and transposed.The data is split into (70,30) and (80,20) ratios to feed into the DL method.
The rules are converted into a vector to predict and classify using CNN-BiLSTM.The structure of CNN-BiLSTM consists of a Conv1D layer with the activation function' relax, a MaxPool1D layer, a Dropout layer of 0.2 dropouts, a Bi-LSTM layer, and the last Dense layer with the activation function' softmax'.The experiment used optimizer 'adam,' at epochs 50 and batch size 10 to classify the 170 AR and predict the next customer purchase category, as in Table 3.The high accuracy rate is 0.77 while considering both splits by implementing CNN-BiLSTM.The prediction label shows that Bites fast food, Bites Ice Bar ice cream, and the Bites-shake will be the next items to purchase.So more variety in these items can be introduced while proposing a new business strategy to increase the profit.

Discussion
The experiment simulation shows that Bi-LSTM performs best with a high accuracy rate considering performance metrics at both splits (80,20) and (70,30) on InstaCart.The simulation performed on the proposed framework shows that AR can be used as a feature selection to find customer buying habits.These rules can be used to predict and classify using deep learning methods.So market basket analysis can be conducted using both AR and DL methods.The research objective is extracting AR by implementing Eclat and FP-Growth algorithms on two datasets, InstaCart, and real-life data Bites Bakers.These extracted rules feed into CNN, Bi-LSTM, and CNN-BiLSTM methods for classification and prediction.The results of simulation at both splits (80,20) and (70,30) show the accuracy and loss of both datasets.Compared to related work, a comparative analysis is conducted with a proposed framework.The author used an ensemble approach to predict the purchase probability of retail items.Table 4 shows that the performance of the proposed framework is better than related work.

Conclusion and Future Work
The relationship among buyers' preferences can be deduced using a data mining technique called market basket analysis.The analysis of these associations assists the retailer in proposing a business strategy by assessing the frequently buying items.The comparison between the customer buying pattern and the production pattern of the company helps in buying the product.Analyzing the consumer buying pattern is crucial; many data mining methods can attain this.AR is a data mining approach to understanding the frequent activities of purchasers.In previous research, many authors used this method to detect only customer frequent purchase patterns.
There is a lack of predicting and classifying the AR to propose new business strategies for the growing needs of consumers.This work introduced a framework using AR and DL methods to overcome the problem.The AR is used as a feature selection to find customer purchase behavior from transactional datasets.These rules are classified and predicted using deep learning methods.
The experiments are simulated in Python and R programming languages on InstaCart and Bites Bakers datasets.To conclude, the best accuracy is achieved using Bi-LSTM with a 0.92 accuracy rate on InstaCart.The performance metrics on Bites Bakers expose CNN-BiLSTM classifies AR at a high accuracy rate of 0.77.
Market Basket Analysis helps the manager understand which items customers purchase during their visits to the store.The results can be used to propose marketing strategies and design different store layouts to attract customers.MBA on Bites Bakers dataset predicts Bites fast food, Bites Ice cream, and Bites shake will be the next product category to purchase.So these products can be placed at stores where new products can be placed to capture customer intentions.The new items in predicted categories can be launched to increase revenue or put on Sale.
The proposed model is slightly overfitting on Bites Bakers dataset.However, the model performed well on Bites Bakers.This happened due to the inconsistency of the real time dataset.In the future, the model will be improved to overcome the inconsistency of the real-time dataset for MBA by introducing new preprocessing methods.Furthermore, the proposed model will be implemented on a large grocery store's transactional dataset for future work.The dimension reduction methods should be used to the dimensions of the baskets.Many ARs can be used to evaluate the performance of the proposed framework.The accuracy of the real dataset will be improved using multiple preprocessing techniques.
It contains 131209 transactions and 39123 different items.Bites Bakers (https://www.facebook.com/bitesbakersvhr/) is an offline grocery store.It has three branches in Hasilpur, Vehari, and Bahawalpur, Punjab.The sales receipts were collected from Bites Bakers in Jalandhar Colony Hasilpur, Punjab.The receipt dataset of November 2020 to February 2021 consists of 2233 products and 32665 transactions.From these 140 products are the Bites special products, as shown in Figure 1.The proposed model diagram showed in Figure 2.
, Coke, Diaper Examples of Association rules are: {Bread} {Jam} The left-hand side item is known as antecedent and the right-hand item is consequent.Customers who buy Bread also buy jam.{Bread, Jam} {Diaper} customers who buy bread and jam also buy Diapers.Different dimensions of Association rules are Single Dimension Association Rules which consist of one antecedent and consequent.While Multidimensional Association Rules consist of more than antecedent and consequent.{Bread} {Jam} {Bread, Milk} {Jam, Diaper} Association rules are generated using different algorithms while doing MBA.These algorithms include FP-Growth, Apriori, Eclat, and Enhanced Apriori with MapReduce Rapid Association Rule Mining, CLOSET, CHARM, CARMA, Sequence, and Apriori with Hashing.In this work, multidimension association rules were identified using Eclat and FP Growth algorithms on InstaCart and Bites Bakers datasets respectively.

Figure 3 .
The top 15 rules with confidence, support, and lift values are shown in Figure.4.

Figure 3 :Figure 4 :
Figure 3: Dimensions of Extracted Association Rules

Figure 6 :Figure 7 :
Figure 6: Names and IDs of Bites Products

Figure 8 :Figure 9 :
Figure 8: Transaction Group based on the Product ID

Figure 10
Figure 10: Top Twenty AR with High Lift