Vol. 26 Issue 1 2023

1Meesala Sudhir Kumar, 2M.Chitra, 3Anubhav Sharma, 4S D Prabu Ragavendiran, 5Sudharani B Banappagoudar, 6A Nirmal Kumar

1 Associate Professor, Department of Computer Sc. & Engineering, MSEIT, MATS University, Raipur(CG) India.

 2Rajalakshmi Institute of technology, Chennai, Tamilnadu

3Assistant Professor, Computer Science and Engineering, IMS Engineering College- Ghaziabad, Ghaziabad, Uttar Pradesh- 201015

4Professor, Department of Computer science and Engineering, RVS Technical campus-Coimbatore, Coimbatore.

5Professor,School of Nursing Science,ITM University,Gwalior(MP).

6Associate Professor, Department of Computer Science and Engineering, CMR Institute of Technology, Hyderabad.


One of the areas where AI can be used is the recognition of emotions through speech, ensuring the real use of these systems to access and democratize this type of technology. Customer service will be personalized, with bots determining the customer’s mood while performing a service and the ability to redirect to human service if slurred speech is noticed. Call centers for emergency and insurance services, in particular, can be positively impacted by emotional recognition. This work presents a Recurrent Neural Network Unit (RNN)-gate recurrent (GRU) and a Convolutional Neural Network (CNN) for speech emotion classification with excellent performance in experimental conditions. The Ryerson Audiovisual Emotional Speech and Song (RAVDESS) dataset was used to train these models, which allowed for the creation of an evaluation and testing environment. Evaluation of a model trained in English provided an accuracy of approximately 42%, which was considered unsatisfactory for the classifier. The main characteristics identified as responsible for performance are sample group characteristics, classification bias without cross-validation, and lack of noise processing. The neural network that gave the best accuracy was RNN-GRU, which achieved 79.69% using a technique that increases the size of the dataset through a stretching process.

Keywords: neural networks, artificial intelligence, emotion recognition, speech feature extraction, machine learning, emotions, speech.