Penerapan Optical Character Recognition (OCR) Dengan Text-To-Speech (TTS) dalam Konversi Gambar ke Suara

Prabowo Budi Utomo(1), Ibnu Mas'ud Luthfi(2), M. Nur Fu'ad(3), M. Mujiono -(4),
(1) Akademi Komunitas Negeri Putra Sang Fajar Blitar  Indonesia
(2) STAI At-Tahdzib Jombang  Indonesia
(3) Akademi Komunitas Negeri Putra Sang Fajar Blitar  Indonesia
(4) Akademi Komunitas Negeri Putra Sang Fajar Blitar  Indonesia

Corresponding Author


DOI : https://doi.org/10.24036/voteteknika.v11i4.125218

Full Text:    Language : id

Abstract


Aksesibilitas informasi menjadi perhatian utama untuk memastikan bahwa semua individu dapat mengakses dan memahami konten secara maksimal Gangguan penglihatan menjadi salah satu disabilitas atau kekurangan yang cukup banyak dialami oleh orang Indonesia yang dalam perkembangannya menimbulkan berbagai masalah sebagai akibat dari kekurangan yang dimiliki salah satunya adalah aksebilitas informasi. Penelitian ini secara tidak langsung output yang dihasilkan merupakan hasil pengabungan dari menggunakan Optical Character Recognition dengan konversi representasi Vector Quantized Variational Autoencoder dengan pengubah suara Text-to-Speech dari google (gTTS) yang dilakukan sebagai upaya untuk menghasilkan kualitas suara yang lebih baik dan alami serta mempertahankan informasi asli. Hasil pengujian dalam penelitian diperoleh akurasi konversi dan pengubahan sebanyak 83,33% dengan 10 data uji dapat dikonversi dan diubah dengan baik dan cukup efektif dalam mempertahankan informasi asli dan menghasilkan suara natural.

Kata kunci : Akses Informasi; Gangguan Penglihatan; OCR; VQ-VAE; gTTS; Machine Learning

 

Accessibility to information is a major concern to ensure that all individuals can access and understand content to the fullest. Impaired vision is one of the disabilities or deficiencies experienced by quite a lot of Indonesians, which in its development creates various problems as a result of the deficiencies they have, one of which is information accessibility. This research indirectly produces the output that is the result of a combination of using Optical Character Recognition with the conversion of the Vector Quantized Variational Autoencoder representation with the Text-to-Speech voice modifier from Google (gTTS) which is carried out as an effort to produce better and more natural voice quality and retain original information. The test results in this study obtained an accuracy of conversion and conversion of 83.33% with 10 test data that can be converted and changed properly and are quite effective in retaining original information and producing natural sound.

 

Keywords: Information Access; Visual Impairment; OCR; VQ-VAE; gTTS; Machine Learning


References


Hsu WN, Harwath D, Miller T, Song C, Glass J. Text-Free Image-to-Speech Synthesis Using Learned Segmental Units [Internet]. Available from: https://wnhsu.github.io/image-to-speech-demo.

Anuradha I, Liyanage C, Wijayawardhana H, Weerasinghe R. Deep learning based sinhala Optical Character Recognition (OCR). In: 20th International Conference on Advances in ICT for Emerging Regions, ICTer 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc.; 2020. p. 298–9.

Effendi J, Sakti S, Nakamura S. End-to-End Image-to-Speech Generation for Untranscribed Unknown Languages. IEEE Access. 2021;9:55144–54.

Sahlol AT, Abd Elaziz M, Al-Qaness MAA, Kim S. Handwritten arabic optical character recognition approach based on hybrid whale optimization algorithm with neighborhood rough set. IEEE Access. 2020;8:23011–21.

Institute of Electrical and Electronics Engineers. SSD’19 : the 16th International Multiconference on Systems, Signals & Devices : March 21-24, 2019, Istanbul, Turkey.

Abdu S, Saranga’ JL, Sulu V, Wahyuni R. DAMPAK PENGGUNAAN GADGET TERHADAP PENURUNAN KETAJAMAN PENGLIHATAN. Jurnal Keperawatan Florence Nightingale. 2021 Jun 26;4(1):24–30.

Dwi Hasriani R, Pencegahan dan Pengendalian Penyakit Tidak Menular D, Kesehatan KR. 645 HIGEIA 4 (4) (2020) HIGEIA JOURNAL OF PUBLIC HEALTH RESEARCH AND DEVELOPMENT Hipertensi dengan Katarak pada Peserta Skrining Gangguan Penglihatan. 2020; Available from: http://journal.unnes.ac.id/sju/index.php/higeiahttps://doi.org/10.15294/higeia/v4i4/38745

Edukasi pencegahan penyakit mata.

Larsson A, Segerås T. Automated invoice handling with machine learning and OCR Automatiserad fakturahantering med maskininlärning och OCR. DEGREE PROJECT COMPUTER ENGINEERING. 2016.

Alghyaline S. Arabic Optical Character Recognition: A Review. Vol. 135, CMES - Computer Modeling in Engineering and Sciences. Tech Science Press; 2023. p. 1825–61.

Prayogi YR, Budiman SN. Color Grading Systems to Classify Ripeness of Apple Mango Fruit. Inform : Jurnal Ilmiah Bidang Teknologi Informasi dan Komunikasi. 2018 Oct 3;3(2):57–61.

Firdaus A, Syamsu Kurnia M, Shafera T, Firdaus WI, Teknik J, Politeknik K, et al. Implementasi Optical Character Recognition (OCR) Pada Masa Pandemi Covid-19 *1. Vol. 13, Jurnal JUPITER. 2021.

Niharika GL, Bano S, Kumar PS, Deepika T, Thumati H. Character Recognition using Tesseract enabling Multilingualism. In: Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2020. Institute of Electrical and Electronics Engineers Inc.; 2020. p. 1321–7.

Girin L, Leglaive S, Bie X, Diard J, Hueber T, Alameda-Pineda X. Dynamical variational autoencoders: A comprehensive review. Vol. 15, Foundations and Trends in Machine Learning. Now Publishers Inc; 2021. p. 1–175.

van den Oord DeepMind A, Vinyals DeepMind O, Kavukcuoglu DeepMind K. Neural Discrete Representation Learning.

Tjandra A, Sisman B, Zhang M, Sakti S, Li H, Nakamura S. VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019. 2019 May 27; Available from: http://arxiv.org/abs/1905.11449

Khete T, Bakshi A. Autonomous Assistance System for Visually Impaired using Tesseract OCR & gTTS. In: Journal of Physics: Conference Series. Institute of Physics; 2022.

Karmel A, Sharma A, Pandya M, Garg D. IoT based Assistive Device for Deaf, Dumb and Blind People. In: Procedia Computer Science. Elsevier B.V.; 2019. p. 259–69.

Fahn CS, Chen SC, Wu PY, Chu TL, Li CH, Hsu DQ, et al. Image and Speech Recognition Technology in the Development of an Elderly Care Robot: Practical Issues Review and Improvement Strategies. Healthcare (Switzerland). 2022 Nov 1;10(11).

https://jdih.kemenkeu.go.id/fulltext/2008/14TAHUN2008UUPenjel.htm#:~:text=Dalam%20Undang%2DUndang%20Dasar%20Negara,Informasi%20dengan%20menggunakan%20segala%20jenis diakses pada 4 Juli 2023

https://www.alomedika.com/prevalensi-dan-penyebab-gangguan-tajam-penglihatan-pada-populasi-di-asia-tenggara diakses pada 6 Juli 2023

https://nasional.kompas.com/read/2022/10/04/19365681/perdami-80-persen-gangguan-penglihatan-di-indonesia-mestinya-bisa-ditangani#:~:text=Adapun%20sejauh%20ini%2C%20terdapat%208,gangguan%20penglihatan%20sedang%20dan%20berat. diakses pada 10 Juli 2023

https://man2kotapayakumbuh.sch.id/2020/03/03/tuna-netra-keutamaan-dan-balasan-surga-untuknya/ diakses pada 12 Juli 2023

Kementrian Kesehatan RI. (2018). InfoDATIN Pusat Data Informasi Kesehatan RI. Jakarta: KEMENKES RI. diakses pada 12 Juli 2023

Kemal, David dkk. 2018. Optical Character Recognition (OCR) menggunakan Tesseract dan Penerapannya pada Industri Digital di Indonesia. https://mti.binus.ac.id/2018/12/26/optical-character-recognition-ocr-menggunakan- diakses pada 13 Juli 2023

tesseract-dan-penerapannya-pada-industri-digital-di-indonesia/, diakses pada 13 Juli 2023

https://pypi.org/project/gTTS/ diakses pada 14 Juli 2023


Article Metrics

 Abstract Views : 175 times
 PDF (Bahasa Indonesia) Downloaded : 52 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.