| dc.contributor.advisor | Fischer, Andreas | |
| dc.contributor.author | Weigold, Armin Pascal | |
| dc.date.accessioned | 2026-01-06T11:55:34Z | |
| dc.date.available | 2026-01-06T11:55:34Z | |
| dc.date.issued | 2023 | |
| dc.date.submitted | 2023-09-06 | |
| dc.identifier.uri | https://dspace.jcu.cz/handle/20.500.14390/48686 | |
| dc.format | p. vi, p. 47 | |
| dc.format | p. vi, p. 47 | |
| dc.language.iso | eng | |
| dc.publisher | Jihočeská univerzita | cze |
| dc.rights | Bez omezení | |
| dc.subject | Text classification | eng |
| dc.subject | e-mail | eng |
| dc.subject | Word2vec | eng |
| dc.subject | Random Forest | eng |
| dc.subject | Support Vector Machine | eng |
| dc.subject | Neural Network | eng |
| dc.title | Text classification of customer inquiries via e-mail | cze |
| dc.title.alternative | Text classification of customer inquiries via e-mail | eng |
| dc.title.alternative | Text klassifizierung von Kundenanfragen via E-Mail | cze |
| dc.type | diplomová práce | cze |
| dc.identifier.stag | 73296 | |
| dc.description.abstract-translated | This work addresses a problem of text classification. The research question focuses on
identifying the optimal workflow to solve the given problem. The dataset consists of
roughly 430,000 labeled e-mails. The problem is tackled in two steps, namely vectorizing
the text and applying a classification algorithm. Several algorithms, including Word2vec
and Tf-idf for vectorization, and Random Forest, Support Vector Machine, Graph Neural
Network, and Feed-Forward Neural Network for classification, were evaluated. The results
show that Word2Vec performed well, while Tf-idf had too high memory demands. In terms
of classification, the Feedforward Neural Network achieved the highest F1 scores of 0.89-
0.90 depending on the trial, followed by Random Forest and Support Vector Machine with
F1 scores of 0.87-0.89, while the graph neural network achieved F1 scores of 0.80-0.87. | eng |
| dc.date.accepted | 2023-09-20 | |
| dc.description.department | Přírodovědecká fakulta | cze |
| dc.thesis.degree-discipline | Artificial Intelligence and Data Science | cze |
| dc.thesis.degree-grantor | Jihočeská univerzita. Přírodovědecká fakulta | cze |
| dc.thesis.degree-name | Mgr. | |
| dc.thesis.degree-program | Artificial Intelligence and Data Science | cze |
| dc.description.grade | Dokončená práce s úspěšnou obhajobou | cze |
| dc.contributor.referee | Bodenschatz, Nicki | |
| dc.contributor.referee | Torkler, Phillipp | |
| dc.description.defence | <p>Komise: Valdman (chairman), Předota, Bukovský, Berl, Torkler, Prokýšek, Budík, Geyer</p>
<p>Student presented his thesis within 13 minutes.</p>
<ul>
<li>Why didn’t you communicate more with your supervisor?</li>
<li>Who labeled the dataset (emails).</li>
<li>Could splitting the dataset into multiple subsets allow usage of Tf-idf? What are the benefits of Word2Vec over Tf-idf?</li>
<li>How did you overcome the problem of highly imbalanced groups in the dataset?</li>
<li>What was the difference between results using the whole dataset and the subset with excluding the “other” group?</li>
</ul> | cze |