We can all agree that assistive technology is great. It gets even better when the contrivance involves not just software or hardware innovation, but an exchange between those needing assistance and people capable of providing it. Like, for example, the app using which you can “lend” your eyes to blind individuals over video chat. In a similar vein, VocaliD has created technology that gives speech-impaired individuals a synthesized voice that sounds similar to their own original voice.
VocalID founder and speech scientist Rupal Patel explains in her TED talk that voice is as unique to individuals as their fingerprint. Unfortunately, people who lose their faculty of speech to injury or sickness are forced to choose from a limited range of synthesized voices. The lack of individuation, although convenient and scalable, isn’t ideal–primarily because a person’s voice is closely linked with their personality. To give a large group of people who vary in gender, age, and body type the same voice takes away from their sense of individuality.
To make amends, Patel, along with speech synthesis expert Timothy Bunnell, came up with a way for each individual to receive a bespoke voice that closely mimics their own. Using VocalID as a platform, the team crowdsources voices from people around the world. To obtain samples, the software leads voice donors through recording sessions during which they’re asked to read aloud through particular reading materials.
The process doesn’t require the enunciation of every sound or word that an individual would require in their daily speech. Instead, the software uses a couple of hours worth of recordings to piece together a profile for each voice. It is similar to how Siri can say practically anything in the English language without the person whose voice that is having to record every phrase or sentence. Similarly, VocalID takes a few sentences and extrapolates what all other sentences would sound like based on what it learns from the available dataset.
All the voices collected from around the world are collated into what’s called a Voicebank. Currently, more than 18,000 people from 110 countries lend to the Voicebank, so there’s a large variety of speech types from which to choose. Each time the need arises, a donor whose voice is similar to that of the recipient is handpicked. The actual delivery of the voice is carried out by devices made by manufacturers such as Tobii Dynavox, with whom VocalID are in partnership.
But wait a second. How does VocalID figure out what a recipient’s voice would sound like if they are speech impaired? To understand that, we must first take a little lesson in speech theory. Speech is produced in two stages. In the ‘source’ stage, the actual sound is generated. It is then filtered by the vocal tract’s resonant properties in the ‘filter’ stage.
Individuals with speech-related disabilities retain the ability to modulate the ‘source’ stage sounds. Using these residual utterances, VocalID is capable of figuring out what a person’s actual voice would sound like. It is through this understanding that a donor is picked from the Voicebank.
VocalID’s crowdsourced voice to custom voice operations is currently underway. You can sign up to donate yours on the website.
Prateek Jose is a writer and engineering undergrad from India with an unhealthy obsession for obscure historical trivia. Conversations about absurdist fiction and the technological singularity make his day. He’s already uploading parts of his brain to servers by writing for websites such as this one.
Latest posts by Prateek Jose (see all)
- Apple September 2018 Event: iPhone XS, XS Max, and More - September 13, 2018
- Apple September 2018 Event: What’s in Store - September 11, 2018
- E3 2018 – The Highlights - June 15, 2018