نوع مقاله : پژوهشی
نویسندگان
دانشکدهی مهندسی صنایع، دانشگاه علم و صنعت ایران
چکیده
کلیدواژهها
عنوان مقاله [English]
نویسندگان [English]
In the midst of webpages, two issues raise for users to access the desired resources. These issues are speed and accuracy that are two important factors for users satisfaction of web services, for which an appropriate information retrieval tool to provide suitable responses is required. Therefore, developing an efficient search engine could be useful in order to attract customers and increase their satisfaction. However, Web search engines often face with a crucial problem, that is, their results, include highly diverse pages in correspondence with vague queries. This kind of diversity makes choosing the most relevant pages more difficult for search engines. On the other hand, the obtained results may be undesirable from the users perspective. In such a situation, discovering natural grouping of pages and finding their representatives help the engines to cover all admissible meanings related to users query. Clustering is the well-known approach for this reduction purpose, i.e., finding a few representatives among highly diverse Web pages. In this paper, we focus on a pioneering algorithm and aim to improve it in terms of the quality of responses and the execution speed. To do so, we propose to provide initial clusters by means of a well-known algorithm, called K-means. This could be a proper initial point. We also reformulate a time-consuming formula of the main algorithm by taking advantages of the properties of linking network. Furthermore, we formulate a set of significant variables of the main algorithm to increase the quality of the clustering. These variables have been considered constant in the main algorithm. The experimental results on ground-truth datasets indicate that the performance of our algorithm is about 30%superior to the performance of the main algorithm both in terms of quality of clustering and execution speed. Moreover, as an interesting case study, we execute our algorithm on the dataset of Persian blogs. We provided this dataset by collecting the information about links and texts included in some blogs. Implementing our algorithm on this interesting dataset provides marvelous results in the case of extracted clusters.
کلیدواژهها [English]