Imagine yourself walking on a busy city street. Noises from the environment converge and digress as you walk: a jet flies overhead, a jackhammer makes repeated sound bursts, cars drive by, people are talking as they walk past you.
The sound waves reach your ears in a mixture of all the sources, overlapping and dynamically changing over time. Thus, the ability to listen to your friend talking while walking down a noisy city street requires brain mechanisms that disentangle the sound mixture, separating your friend’s voice from the sounds of the cars and other passing conversations, providing neural representations that maintain the integrity of the individual and distinct sound sources.
Most of us don’t give this complex, vital process a second thought. We take for granted our ability to navigate our surroundings in scenarios that range from relative quiet to cacophony. A better understanding of this process is critical, because when our auditory systems are not working properly they can affect learning, social interactions and overall quality of life. That’s why I’m passionate about my research into this area.
How Does the Brain Make Sense of the Din?
Even with the advent of state-of-the-art imaging techniques, we still don’t fully understand what the brain does to solve this problem, which has been called the “cocktail party problem”: the ability to select and listen to one voice in the middle of the din of voices that are typically heard at a cocktail party. You can’t tell from looking at a picture (spectrogram) of a sound wave how many sources are overlapping. But when you listen (see the mixed waveform example below), you can fairly easily determine the identity of the sound sources.
Listen to the waveform and identify how many sources there are, and what they are. (See answer at bottom of blog post).
Musical Instruments Hold the Key
Composers have been taking advantage of this remarkable ability of the auditory system for centuries. With one sound source, playing sounds sequentially across a range of frequencies, you experience multiple sound streams occurring simultaneously and converging harmonically.
Listen to the Francisco Tarrega piece to hear auditory stream segregation in a musical example. In this recording of Recuerdos de la Alhambra (Memories of the Alhambra), a guitar technique called a tremolo is played in the higher notes, alternating with a countermelody played with the thumb in the bass line. The resulting perception is of two separate melodies played simultaneously. (In a tremolo, one note is plucked repeatedly by the three middle fingers in rapid succession, giving the illusion of a sustained long note.)
Impact of Developmental Disorders, Aging and Hearing Loss on the Auditory System
Most people are not aware of the remarkable skills of the auditory system. However, if this capacity is disrupted by impairments to peripheral or central auditory mechanisms, for individuals with hearing loss, aging and some neurodevelopmental disorders, the ability to select and listen to a single voice when there are competing sounds becomes a daunting task.
Although technology for prosthetic devices has greatly improved, there is currently no computer algorithm or prosthetic device that can mimic what the brain does with sound when there is competing background noise. Part of the reason is that scientists have still not developed a model of what the brain does in such complex listening situations to account for the flexibility of the auditory system, or the full complexity of the neural network interactions involved in selective listening, which includes a balance of automatic processes (those that are induced by the stimulus properties themselves) and attentive processes (those that are under volitional control).
When I was a graduate student I became interested in this problem, reading Dr. Albert S. Bregman’s influential book Auditory Scene Analysis. In particular, I wanted to determine whether there were automatic brain processes that helped sort the sounds to provide useful information to guide behavior. Prior to this time, auditory stream segregation was largely thought to occur only with attention, by active selection of a subset of the mixture of sounds. However, in the auditory system, the physical sound input is transient and the listener can rely only on a neural memory trace of the previous input (e.g., the successive words in a speech stream). It seemed that because attention is a limited resource, having automatic systems to segregate sounds and hold them in memory would be needed so that attention could be used to focus on making meaning from the pattern of sounds (e.g., understanding the content of a sentence).
How the Memory Organizes Sounds
In my dissertation, I asked whether sounds segregated automatically, without attention being focused on the sounds. The methodology that allowed us to explore this question, to determine how sounds were organized in memory without asking subjects to recount their perception of the sounds, was human electrophysiological recordings of event-related brain potentials. The evoked potentials were indexed when the sounds were segregated without our having to ask participants to perform a task with the sounds, while they read a book and ignored the sounds presented to the ears.
We found evidence that stream segregation occurred automatically—that multiple sound streams could be maintained simultaneously. Recently, there has been an explosion of studies investigating the cocktail party problem, employing sophisticated imaging and computational techniques in various types of human imaging studies. One such study recorded the brain’s response to two simultaneous speakers from electrodes placed directly on the surface of the brain in patients with epilepsy. Dr. Chang and colleagues found that the neural responses in the auditory cortex reflected only the words spoken by the attended speaker and not those from the unattended speaker.
Our recent studies focus on how multiple organizations are represented in the brain when we switch attention from one sound object in the environment to another. This is the problem the brain is “solving” for music perception. Listening to an orchestra, for example, uses global harmony (how all the instruments blend together), without losing access to the individual melodies of the various instruments. We have found evidence that multiple representations are maintained in memory simultaneously in a way that allows rapid and flexible switching from one to another sound stream. Although we have been learning quite a bit more in recent years, we still have a long way to go in understanding how the auditory system interacts with other systems, efficiently allowing you to listen to the melody of the flute while enjoying the harmony of the orchestra.
On a larger scale, obtaining a more detailed understanding of how the brain segregates and integrates sounds holds important implications for the continued development of medical technologies (e.g., hearing aids and cochlear implants) and computer models of speech perception that deal with competing background sounds.
(Answer to sound clip: a person speaking, dolphin song, a creaking door and a violin playing a Bach gigue.)