Alberto Bertone, Annalisa Letizia, Vincenza Tufano
The use of the voice as a biometric parameter has been the subject of many studies in recent years, so much hat it has prompted more and more large industrial companies in the tech area to develop their first implementations such as, for example, the various voice assistants in our smartphones or , in the specific case of speaker recognition. The reasons that led to the development of these technologies are many: the desire to perform daily operations in a simpler way, the need to make many actions possible for people with motor or cognitive deficits, or the increase in investments towards new and more robust security measures. The goal of this thesis work is the development of a text-independent speaker recognition system, capable of performing the task of user identification even in the Open-Set case. A speaker recognition system can be used to perform a verification task (verifying the identity of a person) or identification (identifying the identity of the person who is speaking). The developed model allows to evaluate the performance of the system in the case of identification. The values assumed by the scores in the response of the system turned out to be particularly dependent on the speaker; in fact, the test scores of a particular speaker can reach maximum values around 90%, but only reach 50-60% for others, with a general degradation of performance in the case of female voices. This means that, for an open-set situation, there are cases where a global threshold does not ensure the best result in terms of performance. In cases like this, the use of an adaptive threshold determined specifically for each model could lead to a significant increase in performance.
Development of an artificial intelligence model for voice identification
Extraction of features in the frequency domain (MFCC), training and testing of 3 Machine Learning models (SVM, KNN, GMM) implementation of majority voting.
The system developed in this thesis has allowed to evaluate the performance of a speaker recognition system using the currently available technology.
Conducting tests on datasets acquired in real conditions (possible background noise, different distance from the microphone)