Zobrazit minimální záznam

dc.contributor.advisorFischer, Andreas
dc.contributor.authorWeigold, Armin Pascal
dc.date.accessioned2026-01-06T11:55:34Z
dc.date.available2026-01-06T11:55:34Z
dc.date.issued2023
dc.date.submitted2023-09-06
dc.identifier.urihttps://dspace.jcu.cz/handle/20.500.14390/48686
dc.formatp. vi, p. 47
dc.formatp. vi, p. 47
dc.language.isoeng
dc.publisherJihočeská univerzitacze
dc.rightsBez omezení
dc.subjectText classificationeng
dc.subjecte-maileng
dc.subjectWord2veceng
dc.subjectRandom Foresteng
dc.subjectSupport Vector Machineeng
dc.subjectNeural Networkeng
dc.titleText classification of customer inquiries via e-mailcze
dc.title.alternativeText classification of customer inquiries via e-maileng
dc.title.alternativeText klassifizierung von Kundenanfragen via E-Mailcze
dc.typediplomová prácecze
dc.identifier.stag73296
dc.description.abstract-translatedThis work addresses a problem of text classification. The research question focuses on identifying the optimal workflow to solve the given problem. The dataset consists of roughly 430,000 labeled e-mails. The problem is tackled in two steps, namely vectorizing the text and applying a classification algorithm. Several algorithms, including Word2vec and Tf-idf for vectorization, and Random Forest, Support Vector Machine, Graph Neural Network, and Feed-Forward Neural Network for classification, were evaluated. The results show that Word2Vec performed well, while Tf-idf had too high memory demands. In terms of classification, the Feedforward Neural Network achieved the highest F1 scores of 0.89- 0.90 depending on the trial, followed by Random Forest and Support Vector Machine with F1 scores of 0.87-0.89, while the graph neural network achieved F1 scores of 0.80-0.87.eng
dc.date.accepted2023-09-20
dc.description.departmentPřírodovědecká fakultacze
dc.thesis.degree-disciplineArtificial Intelligence and Data Sciencecze
dc.thesis.degree-grantorJihočeská univerzita. Přírodovědecká fakultacze
dc.thesis.degree-nameMgr.
dc.thesis.degree-programArtificial Intelligence and Data Sciencecze
dc.description.gradeDokončená práce s úspěšnou obhajoboucze
dc.contributor.refereeBodenschatz, Nicki
dc.contributor.refereeTorkler, Phillipp
dc.description.defence<p>Komise: Valdman (chairman), Předota, Bukovský, Berl, Torkler, Prokýšek, Budík, Geyer</p> <p>Student presented his thesis within 13 minutes.</p> <ul> <li>Why didn&rsquo;t you communicate more with your supervisor?</li> <li>Who labeled the dataset (emails).</li> <li>Could splitting the dataset into multiple subsets allow usage of Tf-idf? What are the benefits of Word2Vec over Tf-idf?</li> <li>How did you overcome the problem of highly imbalanced groups in the dataset?</li> <li>What was the difference between results using the whole dataset and the subset with excluding the &ldquo;other&rdquo; group?</li> </ul>cze


Soubory tohoto záznamu

Thumbnail
Thumbnail
Thumbnail
Thumbnail

Tento záznam se objevuje v

Zobrazit minimální záznam