11 October 2024
World Design Embassies
- Design
- Regio Deal Brainport Eindhoven
Worldwide 300 million people suffer from a voice disability or stutter severely. It has a major impact on their daily lives, work, and sense of happiness. They also can’t make intelligible and relaxed phone and video calls. With Whispp they can.
The startup’s real-time assistive voice technology and calling app converts whispered speech (people who stutter severely then speak relaxed) and affected speech (throat cancer, vocal cord paralysis, spasmodic dysphonia) into a person’s clear, natural voice.
During his childhood, Joris Castermans – founder and CEO of Whispp – had a moderate severe stutter, and mainly at high school, he really felt the pain of not being able to express himself. The idea for Whispp was born, based on two main insights. When they whisper, people who stutter severely talk very fluidly and relaxed. They also really dislike making phone calls. So Whispp developed speech technology and a calling app that converts this whispered speech into a person’s clear and natural voice, in real-time, so without any noticeable delay. With LUMO Labs as one of their main investors, the Brainport innovation ecosystem opened up for the startup. We sat down with Castermans for a talk about ambitions, success, and challenges.
'As our lives get more digital, there is a prevalence of voice interfaces. Typically this is designed with healthy voices in mind and hence not inclusive. For a large group of people (with voice disorders), communication is not always accessible. With our assistive technology, we want to further the goal of “no voice left behind”. Our big audacious dream is to have Whispp Assistive Voice Technology worldwide available on each smartphone and laptop to create a more inclusive world'.
'Whispp operates in the domain of assistive technology and we position ourselves as “Assistive Voice Technology”. Big tech and assistive speech tech companies predominantly focus on Automatic Speech Recognition (ASR), also known as speech-to-text (STT) for non-standard speech. The disadvantage of this approach is the high latency of 2 to 5 seconds. This creates barriers to natural conversation because each time, a sentence needs to be spoken, and then text has to be recognized. If the STT model makes mistakes, then the wrong sentence is generated. Another downside of the TTS model is that the intended intonation, pause, emphasis of the words, and emotion are not controllable from just text.
In this scenario, the current AI speech technology solutions are not able to cater an adequate solution for people with voice disabilities who lost their voice but still have good articulation. With our real-time audio-to-audio based voice technology, Whispp has created a new product category and fills this gap to improve the lives of a currently underserved group of 300 million people worldwide'.
'Speech AI technology, in general, is enhancing communication, it is bridging the gap by making speech more accessible and understandable. Advances in speech recognition (and translation) will be improved for low-resource languages and dialects (by low-resource, we mean languages that have relatively less data available for training AI models). This will enhance inclusion for people with different accents and various speech disabilities. Advances in smaller, more optimized models will allow integration of these systems into the smartphones and edge devices, which will make speech technology more accessible than ever before'.
'The fact that we have a positive impact on people’s lives. In May, we participated in a symposium in Florida of the US patient organization Dysphonia International. There we met a group of 150 people who suffer from a variety of voice disabilities. In an open mic session, many of them shared their personal stories and several people already used Whispp in their daily lives'!
'We started with our AI developments in early 2019 so we were ahead of the huge recent AI developments. Whispp uses a different approach for its real-time audio-to-audio-based AI; where audio is first decomposed into various (deep) components and then combined to create audio with different properties. This approach is inspired by the Source-Filter theory of human speech production.
While this approach is novel, for many years, we have been unsure if this research approach leads to fruitful results. Because of the perseverance of our creative AI team and keeping the faith, we managed to develop revolutionary AI technology with a very low conversion latency'.
The Gerard & Anton Awards are supported by EY, Rabobank, V.O. Patents & Trademarks, TWICE, Kadans Science Partner, Braventure, Lumo Labs, Gemeente Eindhoven, High Tech Campus, Philips, Goevaers & Znn. B.V. and DeepTechXL.
23 September 2024
A second life: Lightyear raises €10 mln19 September 2024
For Robin: A brother’s efforts to revolutionize mental healthcare17 September 2024
Joint approach for energy hubs