Pemodelan Topik Menggunakan Metode Latent Dirichlet Allocation Dalam Pengelompokan Teks Berita Utama

Ilham Fadhilah, Akbar (2022) Pemodelan Topik Menggunakan Metode Latent Dirichlet Allocation Dalam Pengelompokan Teks Berita Utama. Undergraduate Thesis thesis, Institut Teknologi Telkom Purwokerto.

[img] Text
Cover.pdf

Download (894kB)
[img] Text
Abstract.pdf

Download (235kB)
[img] Text
Abstrak.pdf

Download (189kB)
[img] Text
BAB I.pdf

Download (359kB)
[img] Text
BAB II.pdf

Download (354kB)
[img] Text
BAB III.pdf

Download (385kB)
[img] Text
BAB IV.pdf
Restricted to Registered users only

Download (1MB) | Request a copy
[img] Text
BAB V.pdf

Download (210kB)
[img] Text
Daftar Pustaka.pdf

Download (211kB)
[img] Text
Lampiran.pdf
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

Based on a survey by the Katadata Insight Center (KIC) and the Ministry of Communication and Information (Kominfo) in 2021, it shows that the majority of Indonesian people access information on social media by 26.7% for online news. With the amount of news that is in the online mass media, it is inversely proportional to the level of accessing the online news community. To facilitate and build public information, topic modeling research was conducted to help the public and online mass media companies to find out the general description and categories of sports in order to increase readership and news quality to be better understood by the public. The method used is the Latent Dirichlet Allocation method because this method improves the mixed model method that captures the exchange of words and documents from the old PLSA (Probabilistic Latent Semantic Analysis) and LSA (Latent Semantic Analysis) methods. The research begins with the process of collecting data from online mass media (detik.com, okezone.com kompas.com, liputan6.com, idntimes.com, Suara.com, and coil.com with the keyword "sport" using the help of a data scraper tool with the number of data is 4303. Furthermore, data preprocessing is carried out to convert unstructured data into structured data, then TF-IDF weighting is carried out, data processing using the Latent Dirichlet Allocation method begins with determining the number of topics using topic coherence based on limit4, limit7, limit10, and limit13. After getting the highest topic coherence value, the topic model will be included in the Latent Dirichlet Allocation method to find out keywords as a category reference in the grouping of headlines. The results of this study get the highest topic coherence value is 0.57522 on the 5th nums topic, so that a model can be obtained. 5 topics. From the topic model, you get add keywords topic 1 (players, united, manchester united, serie, clubs), topic 2 (players, defending, league, history, premiere, premier league), topic 3 (portrait, dramatic, nba, players, footballers), topic 4 (players, league, liverpool, season, champions league), topic 5 (country, munchen, ball, midfielder, germany, bavarian) so that you get the category of football and basketball. From the results of the research conducted, it shows that the researchers succeeded in grouping the main news texts because the results obtained were in accordance with the categories and data taken. Keywords : Online News, Topic Modeling, Latent Dirichlet Allocation, Topic Coherence

Item Type: Thesis (Undergraduate Thesis)
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Informatics > Informatics Engineering
Depositing User: pustakawan ittp
Date Deposited: 15 Jul 2022 05:24
Last Modified: 15 Jul 2022 05:24
URI: http://repository.ittelkom-pwt.ac.id/id/eprint/7465

Actions (login required)

View Item View Item