Tokyo [Japan], November 5 (HBTV): A scientist in Japan has developed a technique that uses brain scans and artificial intelligence to turn a person’s mental images into accurate, descriptive sentences.
Tomoyasu Horikawa, the author of a study published on November 5 in the journal Science Advances, said that while earlier research has made progress in translating imagined words into text, converting complex mental images into language has remained a major challenge.
Horikawa’s new method, known as ‘mind-captioning’, uses artificial intelligence to generate descriptive text that reflects information from the brain about visual elements such as objects, places, actions and events, as well as the relationships between them.
Horikawa, a researcher at NTT’s Communication Science Laboratories near Tokyo, analysed the brain activity of four men and two women, all native Japanese speakers aged between 22 and 37. Their brains were scanned as they watched short, soundless video clips. The participants viewed 2,180 varied clips showing objects, scenes and actions.
Large language models converted the captions of these videos into numerical sequences. Horikawa then trained simpler AI models, called ‘decoders’, to link the scanned brain activity to those sequences. These decoders were later used to interpret the participants’ brain activity while they watched or recalled new videos that the AI had not encountered before. Another algorithm progressively generated word sequences that best matched the decoded brain activity.
As the AI system learned, its ability to produce descriptive text from brain scans improved significantly. Notably, the tool generated text in English, even though none of the participants were native English speakers.
According to Horikawa, the method can generate detailed descriptions of visual content without relying on brain activity from language-related regions. He said this indicates that the technology may also work for individuals who have damage around the brain’s language network.
The study noted that the technology could support people with aphasia, who experience difficulty in language expression due to neurological damage, as well as those with amyotrophic lateral sclerosis, a progressive neurodegenerative disease that affects speech.