نوع مقاله : پژوهشی
نویسندگان
1 دانشکده مهندسی نساجی، دانشگاه صنعتی امیرکبیر
2 استادیار دانشکده مهندسی نساجی\r\nدانشگاه صنعتی امیرکبیر
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Given that modeling and predicting customer behavior using data science helps companies gain a better understanding of customer behavior, this research focuses on analyzing customer reviews in the women’s clothing domain within e-commerce. We employ machine learning techniques and natural language processing (NLP) to achieve this goal. The machine learning models used include Support Vector Machine, Logistic Regression, Decision Tree, Random Forest, Multinomial Naive Bayes, Complement Naive Bayes, XGBoost, and LightGBM. To extract and vectorize text features from the reviews, we utilize the TF-IDF and Word2vec algorithms. We employ Topic Modeling using Latent Dirichlet Allocation (LDA) method and k-means clustering. The dataset consists of women’s clothing reviews, with the target variable being customer ratings in those reviews. The study is conducted in binary, three-class, and five-class scenarios. The target variable, which originally has five classes (scores 1 to 5), is categorized into two-class and three-class modes. In the two-class mode, scores below 3 are class zero, while scores of 3 and above are class one. In the three-class mode, scores below 3 are class zero, scores equal to 3 are class one, and scores above 3 are class two. In all three cases, the Random Forest model performs best, achieving an accuracy of 0.98 in the binary case, 0.95 in the three-class case, and 0.91 in the five-class case. After performing the required preprocessing and feature engineering, principal component analysis (PCA) and T SNE are applied. After that, the scatter diagram of the data is drawn and the optimal number of clusters 3 is estimated using the ELbow diagram. In the next step, by removing punctuation marks, stop words and words with less than three letters, converting the first letter of the words to lowercase and lemmatization, data cleaning was done. After that, topic modeling is done and each of the topics and words related to them are examined. In the next step, the topics are examined in different clusters. These analyzes provide a comprehensive understanding of the key themes and concerns customers have when considering womenswear items in each of the four topics.
کلیدواژهها [English]