Brain-voce neuroprothesis: woman affected by stroke returns to « speak » in almost real time thanks to a new BCI
An American woman, paralyzed, managed to communicate thanks to a brain-computer interface (BCI) that reproduces her voice, reducing to only 80 milliseconds the delay between the imagined phrase and the word. Study on Nature Neuroscience
At the age of 30 Ann, who was a mathematical teacher, suffered a stroke on the brain trunk who left her seriously paralyzed. In 2023, he was able to communicate with the help of a digital avatar that reproduces his voice and even the expressions of the face: thanks to a new one brain-computer interface (or BCI with an English acronym brain computer interface, systems that allow you to create a bridge between machines and brain human) that manages to translate the signs of brain activity into words and expressions to allow faster and more natural communication.
Now the researchers of UC San Francisco and UC Berkeley, who have developed the BCI of Ann, they took a further step forward, synthesizing a speech with a natural sound from brain activity in almost real time. The new results were Published on Nature Neuroscience .
« Certainly, technically speaking, they went beyond the state of the art in the field of BCI, » he comments Donatella Mattianeurologist and director of the laboratory of neuroelectric images and BCI of the Santa Lucia Iccs Foundation of Rome. «All the works done so far have been produced on healthy patients or subjects, people who still did not have a functionally useful speech. Instead, In this study, the possibility of decoding the word only imagined has been demonstrated. So this technology can be useful to those who have completely lost the ability to speak. But, having been tried on one patient, it must be shown that it is generalizing «
The challenge of the latency of vocal neuroprotesis
According to the authors, this work resolves the long -standing challenge of latency in vocal neuroprothesis, The temporal delay between the moment when a subject tries to speak and the one in which the sound is produced. Using recent progress in modeling based on artificial intelligence, The researchers developed a streaming method that summarizes brain signals in an audible speech in almost real time.
This technology could be a fundamental step to allow communication to people who have lost the ability to speak. The study is in fact supported by the National Institute on Deafness and Other Communication Disorders (Nidcd) of the National Institutes of Health.
Professor Mattia explains: «The language decoding technology available so far It needs to accumulate a certain amount of data and only at the end produces a synthetic language. This involves a practically absent fluidity of language. In this study, however, the decoder of the voice works in such a way with the AI DA Allow the decoding of speech on a amount of data accumulated in just 80 milliseconds. So the delay between the imagined phrase and the translation in question is reduced by 8 times, approaching an almost natural communication «
Almost synchronous vocal streaming
« Our streaming approach brings the same rapid vocal decoding capacity of devices such as Alexa and Siri to the neuropronesi, » he says Gopala AnumanchipalliRobert E. and Beverly A. Brooks Assistant Professor of Electrical Engineering and Computer Sciences at the UC Berkeley and the main study co-receiver. “Using a similar type of algorithm, we discovered that We could decode neural data And, for the first time, to enable almost synchronous vocal streaming. The result is a more naturalistic and fluent vocal synthesis ».
Enormous potential
« This new technology has enormous potential to improve the quality of life of people with severe paralysis that affects the word », underlines the neurosurgeon Edward Chang of the UCSF, the main senior associated researcher of the study. Chang directs a clinical experimentation at the UCSF which aims to develop a vocal neuroprotesis technology Using high density electrodes that record neural activity directly from the brain surface. « It is exciting that the latest progress of artificial intelligence are significantly accelerating BCI for practical use in the real world in the near future, » he adds.
The system of the BCI. Photo: Ken Probst/ UC San Francisco
Technique applicable to different devices
The researchers also demonstrated that Their approach can work well with a variety of other brain detection interfacesincluding microelectrodes (Mea) in which the electrodes penetrate the surface of the brain, or non -invasive recordings (semig) that use sensors on the face to measure muscle activity.
« By demonstrating an accurate synthesis between brain and voice on other silent spoken data sets, we have shown that this technique is not limited to a specific type of device, » he said Kaylo LittlejohnPhD student at the Department of Electric and Computer Engineering of the UC Berkeley and the main co -author of the study. « The same algorithm can be used in different ways, provided that there is a good signal».
How neuroprothesis works
Second Cheol jun chothe main co -author of the study and PhD student in electricity and computer engineering at the University of California in Berkeley, the neuroprothesis works by sampling neural data from bark motor, the part of the brain that controls the production of language, and then uses artificial intelligence to decode brain functions in language. « We are essentially intercepting signs in which thought is translated into articulation and in the middle of that motor control, » he said. « So what we are decoding is after a thought has happened, after we decided what to say, after we decided what words to use and how to move the muscles of our vocal tract ».
The training of the algorithm
To collect the data necessary to train their algorithm, researchers They first asked Ann, their subject, to look at a message on the screenlike the phrase: « Hey, how are you? », And then to try to silently pronounce that sentence. «This has provided us a mapping between the windows divided by the neural activity that generates and the target phrase he is trying to say, Without needing to vocalize at any time, « said Littlejohn.
So long as Ann has no residual vocalization, The researchers did not have a target audio, or output, on which to map neural data, the input. They solved this challenge Using the IA to fill the missing details. « We used a pre-adanned Text-to-toach model to generate audio and simulate a target, « said Cho. « AND We also used Ann’s item before the injuryso when we decode the output, it sounds more as if she were talking ».
How to recover an 8 -second delay for single sentence
In theirs previous BCI study the researchers had a long latency for decoding, About a delay of 8 seconds for a single sentence. With the new streaming approach, the audible output can be generated in almost real time, while the subject is trying to speak.
To measure latency, researchers used Speech detection methodswhich have allowed them to identify the brain signals that indicate the beginning of an attempt to speak.
“We can see in relation to that sign of intention, within 1 secondthat we are getting the first sound, « explains Anumanchipalli. « And the device can continually decode the speech, so Ann can continue to speak without interruption. » This greater speed did not take place at the expense of precision. The fastest interface provided the same high level of decoding accuracy of their previous non-Streaming approach. « He is promising to see, » says Littlejohn. « Previously, it was not known if an intelligible speech could be streamed by the brain in real time. »
Test on the words of the NATO alphabet
Anumanchipalli added that researchers do not always know if large -scale artificial intelligence systems are learning and adapting, or simply by combining and repeating parts of the training data. So The researchers also tested the capacity of the model in real time to summarize words that were not part of the vocabulary of the training set setin this case 26 rare words taken from the NATO phonetic alphabet (the Atlantic alliance), as « Alpha », « Bravo », « Charlie » and so on. « We wanted to see if we could generalize invisible words and really decode the language models of Ann, » he said.
« We have discovered that our model is good for it, which shows that it is actually learning the bricks of sound or voice. » Ann, which also participated in the study of 2023, He shared his experience with the new streaming synthesis approach with the researchers with respect to the text-video decoding method of the previous study.
« He sent that streaming synthesis was a more controlled mode voluntarily, » said Anumanchipalli. « Listening to his own voice in almost real time has increased his sense of incarnation. »
Future developments
This last work brings researchers closer to achieving natural language through BCI devices, while laying the foundations for future progress. « This Proof-Of-Concept framework is a real turning point, » says Cho. «We are optimistic that we can now progress at every level. As for engineering, for example, We will continue to push the algorithm to see how we can generate the spoken better and faster».
Also improve the expressiveness of the voice
The researchers also concentrate on the increase in the expressiveness of the outgoing item, To reflect the changes in tone, intonation or intensity that occur during the speech, for example when someone is excited.
« It is an ongoing job, to try to see how well we can actually decode these characteristic parallinguistic with brain activity, » concludes Littlejohn. « This is a old -date problem also in the fields of classical audio synthesis and would fill the gap towards complete naturalism ».