عنوان مقاله [English]
Customer segmentation (CS) is one of the most important aspects of customer relationship management. Machine learning (ML) algorithms for solving pattern recognition problems are often only successful if the available data are preprocessed based on appropriate feature selection. The feature selection process can be considered as a problem of global combinatorial optimization in ML. It can help to reduce the total number of features and to remove irrelevant and redundant data. Feature selection has received considerable attention in various areas where thousands of features are available. The main goal of feature selection is to identify a subset of features that are most informative, and, therefore, most predictive, for a given response variable. Finding an optimal feature subset is usually hard to control and many problems related to feature selection have been shown to be NP-hard. Successful implementation of feature selection not only provides important information for
segmentation, but also reduces computational and analytical efforts for the analysis of high-dimensional data. Optimal segmentation based on related features can help to develop marketing strategies more accurately through spending resources effectively. However, the creation of a customer segmentation system (CSS) that has, simultaneously, both low complexity and optimal segmentation abilities, is a difficult task due to the large number of possible features. Although segmentation methods are popularly used, they cannot be useful unless irrelevant features are removed, because irrelevant features will present inappropriate CS and create poor results. Thus, the purpose of this paper is to present a hybrid intelligent CSS (HICSS) that is computationally efficient and optimal. At first, a pruned regression tree (PRT) is designed for optimal feature selection. However, performing appropriate feature selection is a hard job and there is no general applicable method available. Then, a self-organizing map (SOM) is developed to determine the optimal number of features based on the Davies-Bouldin Index. To measure the model, an insurance company dataset has been employed. The obtained results
show that the PRT removes 93% of available features in this way, considerably reducing computation costs. In addition, the validation results show that the HICSS based on SOM has differentiated clusters very accurately. So, customers of the considered product have been segmented into 24 clusters and can simply spend marketing resources to attract similar customers to the best cluster customers.