PENERAPAN METODE WORD2VEC UNTUK MENDETEKSI KEMIRIPAN DOKUMEN

ADE, RIYANI (2019) PENERAPAN METODE WORD2VEC UNTUK MENDETEKSI KEMIRIPAN DOKUMEN. Undergraduate Thesis thesis, Institut Telkom Purwokerto.

[img]
Preview
Text
cover.pdf - Accepted Version

Download (2MB) | Preview
[img]
Preview
Text
Abstract.pdf - Accepted Version

Download (417kB) | Preview
[img]
Preview
Text
Abstrak.pdf - Accepted Version

Download (432kB) | Preview
[img]
Preview
Text
Bab i.pdf - Accepted Version

Download (804kB) | Preview
[img] Text
Bab ii.pdf - Accepted Version
Restricted to Registered users only

Download (1MB)
[img] Text
Bab iii.pdf - Accepted Version
Restricted to Registered users only

Download (688kB)
[img] Text
Bab iv.pdf - Accepted Version
Restricted to Registered users only

Download (1MB)
[img]
Preview
Text
Bab v.pdf - Accepted Version

Download (370kB) | Preview
[img]
Preview
Text
Daftar Pustaka.pdf - Accepted Version

Download (497kB) | Preview

Abstract

ABSTRACT Plagiarism is the act of taking part or all of people’s ideas in the form of documents or texts without attaching the sources of information retrieval. Therefore plagiarism detection is necessary to reduce plagiarism and keep the originality of people’s work. This research aims to detect the similarity of text documents using the Word2vec method and TF-IDF extraction fiture to determine the difference in values. The document used for comparison of this text is containing of 116 Indonesian abstracts. From the result, when stemming is applied the result was on average 5%, which is higher when stemming isn’t applied. Produces a similarity value over 50% for documents with a high level of similarity. Meanwhile for documents with a low level of similarity or not plagiarism produces a similarity value under 30%. The step of preprocessing is consisting of folding cases, tokenizing, removeal stopwords, and stemming. After the preprocessing process, the next step is weighting TFIDF and Word2vec. Than the next step was the similarity value uses Cosine Similarity to get percentage of similarity value. Based on the results of the experiment, Word2vec results the similarity value higher by an average of 28% compared to the TF-IDF weighting value. Keyword: Cosine Similarity, Document, plagiarism, preprocessing, TF-IDF, Word2vec

Item Type: Thesis (Undergraduate Thesis)
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Industrial Engineering and Informatics > Informatics Engineering
Depositing User: Ade Rais Hambali
Date Deposited: 26 Jun 2020 01:42
Last Modified: 26 Jun 2020 01:42
URI: http://repository.ittelkom-pwt.ac.id/id/eprint/5700

Actions (login required)

View Item View Item