File Name: customer segmentation and clustering using sas enterprise miner .zip
Many small online retailers and new entrants to the online retail sector are keen to practice data mining and consumer-centric marketing in their businesses yet technically lack the necessary knowledge and expertise to do so.
In this article a case study of using data mining techniques in customer-centric business intelligence for an online retailer is presented. The main purpose of this analysis is to help the business better understand its customers and therefore conduct customer-centric marketing more effectively.
On the basis of the Recency, Frequency, and Monetary model, customers of the business have been segmented into various meaningful groups using the k -means clustering algorithm and decision tree induction, and the main characteristics of the consumers in each segment have been clearly identified.
Accordingly a set of recommendations is further provided to the business on consumer-centric marketing. For the past 10 years, we have witnessed a steady and strong increase of online retail sales.
Compared with traditional shopping in retail stores, online shopping has some unique characteristics: each customer's shopping process and activities can be tracked instantaneously and accurately, each customer's order is usually associated with a delivery address and a billing address, and each customer has an online store account with essential contact and payment information. These desirable, special online shopping characteristics have enabled online retailers to treat each customer as an individual with personalized understanding of each customer and to build upon customer-centric business intelligence.
In relation to customer-centric business intelligence, online retailers are usually concerned with the following common business concerns:. What are the distinct characteristics of them?
In which sequence the products have been purchased? Although many famous online retail brands are embracing data mining techniques as crucial tools to gain competitive advantages on the market, there are still many smaller ones and new entrants are keen to practise consumer-centric marketing yet technically lack the necessary knowledge and expertise to do so.
The online retailer considered here is a typical one: a small business and a relatively new entrant to the online retail sector, knowing the growing importance of being analytical in today's online businesses and data mining techniques, however, lacking technical awareness and recourses. On the basis of the RFM model, customers of the business have been segmented into various meaningful groups using the k -means clustering algorithm and decision tree induction, and the main characteristics of the consumers in each segment have been clearly identified.
Accordingly, a set of recommendations is provided to the business on customer-centric marketing and further data analysis tasks. The analysis is developed in a step-by-step way. The rest of this article is organized as follows. The next section provides the background information about the online retailer studied in the article along with the associated dataset to be explored.
The section after that discusses in detail about the main steps and tasks for data pre-processing in order to create an appropriate target dataset for the required further analyses. In the subsequent section the k -means clustering analysis is performed and a set of meaningful clusters and segments of the target dataset has been identified.
A detailed discussion on each of the clusters is given, and the segmentation is further refined by using decision tree induction. The penultimate section summarizes the essential consumer-centric business intelligence based on the analysis results, and provides some concrete recommendations to the online retailer aiming at maximizing profits for the business.
Finally the concluding remarks are given in the last section. The online retailer under consideration in this article is a UK-based and registered non-store business with some 80 members of staff.
The company was established in mainly selling unique all-occasion gifts. For years in the past, the merchant relied heavily on direct mailing catalogues, and orders were taken over phone calls. It was only 2 years ago that the company launched its own web site and shifted completely to the Web. Since then the company has maintained a steady and healthy number of customers from all parts of the United Kingdom and Europe, and has accumulated a huge amount of data about many customers.
The company also uses Amazon. The customer transaction dataset held by the merchant has 11 variables as shown in Table 1 , and it contains all the transactions occurring in years and It should be noted that the variable PostCode is essential for the business as it provides vital information that makes each individual consumer recognizable and trackable, and therefore it makes some in-depth analyses possible in the present study.
As the first ever pilot study for the business to generate sensible customer intelligence, only the transactions created from 1 January to 31 December are explored in this article. On average, each postcode is associated with five transactions, that is, each customer has purchased a product from the online retailer about once every 2 months.
In addition, only consumers from the United Kingdom are analysed. It is interesting to notice that the average number of distinct products items contained in each transaction occurring in was This seems to suggest that many of the consumers of the business were organizational customers rather than individual customers. In order to conduct the required RFM model-based clustering analysis, the original dataset needs to be pre-processed. The main steps and relevant tasks involved in the data preparation are as follows:.
Select appropriate variables of interest from the given dataset. Separate the variable InvoiceDate into two variables Date and Time. This allows different transactions created by the same consumer on the same day but at different times to be treated separately. Filter out any transactions that do not have a postcode associated with. This resolves any missing value issues in relation to the variable PostCode. In addition, filter out any transactions that are not associated with a United Kingdom's postcode.
Sort out the dataset by Postcode and create three essential aggregated variables Recency , Frequency and Monetary. Calculate the values of these variables per postcode.
Following these steps a target dataset for the analysis has been generated. Part of the target dataset is shown in Figure 1 , and the variables in the target dataset and their statistics are described in Tables 2 and 3. The SAS procedures proc means and proc sql were used to transform the dataset and to calculate the values for the variables Recency , Frequency and Monetary , for each given postcode, respectively. With the prepared target dataset we intended to identify whether consumers can be segmented meaningfully in the view of recency, frequency and monetary values.
The k -means clustering algorithm was employed for this purpose, and it can be easily performed by using the Cluster node in SAS Enterprise Miner.
As well-known, the k -means clustering algorithm is very sensitive to a dataset that contains outliers anomalies or variables that are of incomparable scales or magnitudes. Examining the histograms of the variables Recency , Frequency and Monetary of the target dataset in SAS Enterprise Miner, as illustrated in Figure 2 , it is evident that there are a few instances having quite different monetary and frequency values compared to the majority of the instances in the dataset.
These instances are valid from the business point of view as they are genuine transaction records; however, they are outliers from the data analysis point of view. Therefore, these instances should be isolated from the majority and treated separately. As such, these variables should be normalized before the clustering analysis. On the basis of the initial insight into the dataset, a project diagram has been set up in SAS Enterprise Miner for the clustering analysis as depicted in Figure 3.
There are four nodes in the diagram. In the Data Sources Target Dataset node, the three variables Recency , Frequency and Monetary were chosen as input for the clustering analysis. The Filter node was set to exclude from the analysis any instances having a rare value for any variables involved, and the minimum cutoff value for rare values was set to 1 per cent of the total number of instances under consideration.
Overall there were totally 73 instances were excluded by the Filter node, and the summary of the resultant filtered target dataset is given in Table 5. In the Cluster node, the standard range transformation for normalization was used with the number of clusters specified as 3, 4 and 5, respectively, and finally, the Segment Profile node was utilized to assists to interpret each cluster found. The clustering and segment results with five clusters are shown in Tables 6 and 7 , and the distribution of the instances within each cluster is detailed in Figures 4 and 5.
This segmentation by five clusters seems to have a clearer interpretation of the target dataset than the ones by three and four clusters.
Interpreting and understanding each cluster identified is crucial in generating customer-centric business intelligence. Examining Table 7 and Figures 4 and 5 , it is interesting to see that each cluster indeed contains a group of consumers that have certain distinct and intrinsic features as detailed below. Cluster 1 relates to some consumers, composed of This group seems to be the least profitable group as none of the customers in this group purchased anything in the second half of the year.
Contrasted with the customers in cluster 1, the customers in cluster 5 mainly started shopping with the online retailer at the beginning of the year, and continued to the end of the year with an average value of recency 0.
They purchased quite often and as a result, spent a quite high amount of money. This group of consumers can be categorized as very high recency, very high frequency and very high monetary with a high spending per consumer.
In fact, those consumers contributed This group, although the smallest only composed of 5. Cluster 4 contains some consumers with a very high value for frequency and monetary, although lower than those of cluster 5.
This group seems to be the second high profit group. There are some consumers in cluster 2. Compared with clusters 4 and 5, this group of customers has a lower frequency throughout the year and a significantly smaller average value of monetary, indicating that a much smaller amount of spending per consumer.
This group can be categorized as low recency, high frequency and medium monetary with a medium spending per consumer. Cluster 3 is the largest-sized group with consumers. Consumers in this group have a reasonable value of frequency.
Compared with clusters 2 and 4, this group has a lower but reasonable value of monetary as the group includes many newly registered consumers starting shopping with the retailer very recently. This group seems to have represented ordinary consumers and therefore has a certain level of uncertainty in terms of profitability.
In the long-term view, some of the consumers might be potentially very highly profitable or unprofitable at all. We use Figure 6 to summarize our analysis made so far: in the whole population of the consumers, 47 per cent of them were ordinary shoppers with reasonable spending and frequency, about 34 per cent were medium to high profit, 5 per cent were extremely highly profit, and the remaining 14 per cent were extremely low profit.
About 22 per cent of the consumers contributed roughly 60 per cent of the total sales. Overall the business seems to be quite healthy in terms of profitability.
As discussed above, cluster 3 is the most diverse cluster among the five identified clusters in the sense that it contains both newly registered and old customers as well. To refine the segmentation of the instances in this cluster, a decision tree has been used to create some nested segments internally inside the cluster, as shown in Figure 5. In other words, these nested segments form some sub-clusters inside cluster 3, and make it possible to categorize the consumers concerned into some sensible sub-categories.
For example, as shown in Figure 7 , the customers can be divided into such categories as frequency more than 2. Also, it is interesting to note that the relationship between frequency and monetary seems to be a monotonic linear relationship. The most valuable consumers of the business have contributed more than 60 per cent of the total sales in year , whereas the least valuable ones only made up 4 per cent of the total sales.
For each of these consumer groups, it is essential to further find out which products the customers in each group have purchased, which products have been purchased together most frequently and in which sequence the products have been purchased.
The business can gain a better understanding of the consumers by exploring the associations among consumer groups and the products they have purchased. Many of the consumers of the business were organizational consumers with a high quantity of a product per transaction.
Examining at which specific times seasons , what products and which types of products they have purchased frequently will be beneficiary to the business. It will be also interesting to see if there are any differences between different types of customers, that is, organizational and individual customers, in terms of their shopping patterns. Monitoring the diversity of the most diverse customer group and predicting which customer will potentially become affiliated to the most or the least profitable group is very useful for the business in the long term.
Identifying appropriate predictors or indictors for such predictions is invaluable.
Updated new edition of Ralph Kimball's groundbreaking book on dimensional modeling for data warehousing and business …. Statistical methods are a key part of data science, yet few data scientists have formal statistical …. Natural Language Processing in Action is your guide to building machines that can read and interpret …. Skip to main content. Start your free trial. You will learn how to segment customers more intelligently and to achieve, or at least get closer to, the one-to-one customer relationship that today's businesses want.
Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. Malley , Centrica Published Numerous papers have discussed the importance of businesses understanding the value of their customers, taking advantage of various segmentation techniques to target customers more efficiently, enhance business processes and improve the customer journey. The process of creating a segmentation is often considered as a combination of science and art to deliver meaningful and useful results for the business.
Skip to Main Content. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. Use of this web site signifies your agreement to the terms and conditions. Customer-Churn Research Based on Customer Segmentation Abstract: This article explores the unique features of the customer relationship management CRM system in Telecom industry and presents a customer-churn model based on customer segmentation. First, the improved Fuzzy C-means clustering algorithm is used to segment customer and conclude high value customer group characteristics.
Many small online retailers and new entrants to the online retail sector are keen to practice data mining and consumer-centric marketing in their businesses yet technically lack the necessary knowledge and expertise to do so. In this article a case study of using data mining techniques in customer-centric business intelligence for an online retailer is presented. The main purpose of this analysis is to help the business better understand its customers and therefore conduct customer-centric marketing more effectively.
Но сейчас я. ГЛАВА 69 - Эй, мистер. Беккер, шедший по залу в направлении выстроившихся в ряд платных телефонов, остановился и оглянулся.
Лампочки в конце коридора не горели, и на протяжении последних двадцати метров можно было различать только смутные силуэты. Женщина с кровотечением… плачущая молодая пара… молящаяся маленькая девочка. Наконец Беккер дошел до конца темного коридора и толкнул чуть приоткрытую дверь слева. Комната была пуста, если не считать старой изможденной женщины на койке, пытавшейся подсунуть под себя судно.
Ein Ring, - сказал Беккер. - Du hast einen Ring.
Он протянул руку. - El anillo. Кольцо. Беккер смотрел на него в полном недоумении. Человек сунул руку в карман и, вытащив пистолет, нацелил его Беккеру в голову.
Your email address will not be published. Required fields are marked *