نوع مقاله : پژوهشی
نویسندگان
گروه مهندسی صنایع، دانشکده فنی و مهندسی، دانشگاه یزد
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
Today, the use of data mining techniques such as classification, clustering, discover repetitive pattern and discover outliers in different domains
including production, medicine, social, meteorology, stock exchange, sales, customer service and other areas are increasing. Data mining techniques are specifically designed for static data. Therefore, their use for time series data requires some modifications to their respective algorithms. One of these changes is the selection of the appropriate similarity measurement method, because similarity measurement methods are used in all data mining techniques. Therefore, in this research, we will evaluate and compare the effect of two commonly used and efficient methods of time series similarity measurement in data mining. This evaluation is done in relation to the effectiveness of these
methods in achieving better results. These methods are the Longest Common Sub Sequence (LCSS) method and the Dynamic time Warping (DTW) method. The main purpose of this research is to compare the performance of these methods in time series data mining. The data mining techniques that used in this research are the nearest-neighbor technique and k-medoids clustering algorithm. The performance evaluation process is described in the text. This process uses the nearest-neighbor technique to calculate the accuracy of detection of right time
series class, and uses the k-medoids clustering technique to calculate the clustering accuracy, the ability to correctly determine the number of clusters, and the ability to determine the better cluster representative. For this purpose, we use 63 time series data sets by random from a world-renowned database that named UCR collection. The results show that the effect of LCSS method is significantly better than the effect of DTW method on the correct detection accuracy of time series class and clustering accuracy by 99% and 92.5% confidence, respectively, but there is no significant difference between them in terms of their effect in determining the number of clusters and cluster representatives. The results of this research help to use these methods in appropriate data mining techniques in issues such as customer segmentation, workshop scheduling and the like more accurately.
کلیدواژهها [English]