Automatic Speech Recognition on Non-Pathological Dataset of Urdu Language
PDF

Keywords

Voice dataset, Urdu language, SVM, MFCC, Pitch

How to Cite

Imtiaz, A. ., Rashid, M., Abid Syed, S., Zahid, H., Iqbal, M. ., & Khan, A. A. . (2022). Automatic Speech Recognition on Non-Pathological Dataset of Urdu Language. KIET Journal of Computing and Information Sciences, 5(2). https://doi.org/10.51153/kjcis.v5i2.87

Abstract

One of its subsystems, speech, has a strong underlying characteristic and a distinct voice. Voice disorders are abnormal conditions that influence the quality of voice. Several protocols, including acoustic analysis, can detect clinical voice pathology. Based on a computerized acoustic analysis, machine learning algorithms and non-invasive systems may play a very vital part in initial detection, tracking, and even growth of proficient pathological speech analysis. The aim of this research paper is to collect a non-pathological dataset i.e. healthy voice dataset. Two important and critical features; 1) MFCC and 2) Pitch are used to generate a final audio clip. SVM used as a classifier to train and test the dataset model and the models exhibited reasonably high training and testing accuracies i.e. 85.886% which proves to be a milestone on Urdu language dataset.

https://doi.org/10.51153/kjcis.v5i2.87
PDF

References

Graham Williamson. Human Communication: A Linguistic Introduction (2nd Edition) 2006.

ASHA Clinical Topics. Voice disorders. Website, 2019. https://www.asha.org/PracticePortal/Clinical-Topics/Voice-Disorders

Michael J. Clark James Hillenbrand, Laura A. Getty and Kimberlee Wheeler. Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97(1):3099–3111, 1995.

T. Parsons. Voice and Speech Processing. McGraw-Hill College Div., Inc, 1986.

G. C. M. Fant. Acoustic Theory of Speech Production. Mouton, Gravenhage, 1960.

J. R. Deller, J. G. Proakis, J. H. L. Hansen. Discrete-Time Processing of Speech Signals. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1993.

D. O’Shaughnessy. Speech Communication: Human and Machine. Addison Wesley Publishing Co., 1987.

L. R. Rabiner, R. W. Schafer. Digital Processing of Speech Signals. Prentice-Hall, Inc., Englewood Cliffs, 1978.

"ATLAS - Urdu: Urdu Language", Ucl.ac.uk, 2021. [Online]. Available: https://www.ucl.ac.uk/atlas/urdu/language.html. [Accessed: 17- Sep- 2021].

S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, W. Xu, Applications of support vector machine (SVM) learning in cancer genomics, Cancer Genomics-Proteomics, 15 (2018), 41–51.

A. Shmilovici, Support vector machines, in Data Mining and Knowledge Discovery Handbook, Springer, Boston, MA, (2009), 231–247.

S. Memon, M. Lech, L. He, Using information theoretic vector quantization for inverted MFCC based speaker verification, in 2009 2nd International Conference on Computer, Control and Communication, IEEE, (2009), 1–5.

M. Sahidullah, G. Saha, On the use of distributed dct in speaker identification, in 2009 Annual IEEE India Conference, IEEE, (2009), 1–4.

Ö. Eskidere, A. Gürhanl?, Voice disorder classification based on multitaper mel frequency cepstral coefficients features, Comput. Math. Methods Med., 2015 (2015), 956249.

R. O. Duda, P. E. Hart, D. G. Stork, Pattern Classification, 2nd edition, Wiley-Interscience, USA, 2000.