Text classification of customer inquiries via e-mail

Weigold, Armin Pascal

dc.contributor.advisor	Fischer, Andreas
dc.contributor.author	Weigold, Armin Pascal
dc.date.accessioned	2026-01-06T11:55:34Z
dc.date.available	2026-01-06T11:55:34Z
dc.date.issued	2023
dc.date.submitted	2023-09-06
dc.identifier.uri	https://dspace.jcu.cz/handle/20.500.14390/48686
dc.format	p. vi, p. 47
dc.format	p. vi, p. 47
dc.language.iso	eng
dc.publisher	Jihočeská univerzita	cze
dc.rights	Bez omezení
dc.subject	Text classification	eng
dc.subject	e-mail	eng
dc.subject	Word2vec	eng
dc.subject	Random Forest	eng
dc.subject	Support Vector Machine	eng
dc.subject	Neural Network	eng
dc.title	Text classification of customer inquiries via e-mail	cze
dc.title.alternative	Text classification of customer inquiries via e-mail	eng
dc.title.alternative	Text klassifizierung von Kundenanfragen via E-Mail	cze
dc.type	diplomová práce	cze
dc.identifier.stag	73296
dc.description.abstract-translated	This work addresses a problem of text classification. The research question focuses on identifying the optimal workflow to solve the given problem. The dataset consists of roughly 430,000 labeled e-mails. The problem is tackled in two steps, namely vectorizing the text and applying a classification algorithm. Several algorithms, including Word2vec and Tf-idf for vectorization, and Random Forest, Support Vector Machine, Graph Neural Network, and Feed-Forward Neural Network for classification, were evaluated. The results show that Word2Vec performed well, while Tf-idf had too high memory demands. In terms of classification, the Feedforward Neural Network achieved the highest F1 scores of 0.89- 0.90 depending on the trial, followed by Random Forest and Support Vector Machine with F1 scores of 0.87-0.89, while the graph neural network achieved F1 scores of 0.80-0.87.	eng
dc.date.accepted	2023-09-20
dc.description.department	Přírodovědecká fakulta	cze
dc.thesis.degree-discipline	Artificial Intelligence and Data Science	cze
dc.thesis.degree-grantor	Jihočeská univerzita. Přírodovědecká fakulta	cze
dc.thesis.degree-name	Mgr.
dc.thesis.degree-program	Artificial Intelligence and Data Science	cze
dc.description.grade	Dokončená práce s úspěšnou obhajobou	cze
dc.contributor.referee	Bodenschatz, Nicki
dc.contributor.referee	Torkler, Phillipp
dc.description.defence	<p>Komise: Valdman (chairman), Předota, Bukovský, Berl, Torkler, Prokýšek, Budík, Geyer</p> <p>Student presented his thesis within 13 minutes.</p> <ul> <li>Why didn’t you communicate more with your supervisor?</li> <li>Who labeled the dataset (emails).</li> <li>Could splitting the dataset into multiple subsets allow usage of Tf-idf? What are the benefits of Word2Vec over Tf-idf?</li> <li>How did you overcome the problem of highly imbalanced groups in the dataset?</li> <li>What was the difference between results using the whole dataset and the subset with excluding the “other” group?</li> </ul>	cze

Soubory tohoto záznamu

Název:: WeigoldMaTh.pdf
Velikost:: 1.186Mb
Formát:: PDF
Popis:: Plný text práce

Zobrazit/otevřít

Název:: WEIGOLD_Armin_supervisor_state ...
Velikost:: 397.4Kb
Formát:: PDF
Popis:: Posudek vedoucího práce

Zobrazit/otevřít

Název:: WEIGOLD_Armin_opponent_review_1.pdf
Velikost:: 362.7Kb
Formát:: PDF
Popis:: Posudek oponenta práce

Zobrazit/otevřít

Název:: WEIGOLD_Armin_opponent_review_2.pdf
Velikost:: 139.8Kb
Formát:: PDF
Popis:: Posudek oponenta práce

Zobrazit/otevřít

Tento záznam se objevuje v

Přírodovědecká fakulta

Zobrazit minimální záznam