Speech To Text


During my education, I had the opportunity to complete an internship at QuickSearch in Halmstad. During this time, I was tasked with developing an API capable of processing recorded phone calls. The API received an audio file, identified the different speakers, and transcribed their dialogue into text, which was then converted into a JSON file. This solution was designed to facilitate the analysis of conversations, allowing for better insights and improvements based on the transcribed data.
The API employed two AI models. A transcription model was used to convert spoken content into text, followed by a voice analysis model to determine the identity of each speaker.