Published: April 25, 2022

Author: Olivia Walt
Nominator: Rebecca Scarborough
Course: LING 3100 Language Sound Structures, Fall 2021
LURA 2022

Anyone who takes a language class at some point in their life is practically guaranteed to use music to help facilitate their learning, so common practice has it become in language classrooms across the globe. Language learners everywhere can be found learning songs to help them memorize specific, often unique linguistic features 鈥 from alphabets of sound structures to eccentric colloquial phrases. This is possible because like understanding language, listening to music involves carefully interpreting complex patterns of sound.听

Considering this evident overlap of music and spoken language, it may seem reasonable to assume that having an affinity for one automatically grants an advantage to the other. However, it turns out that this is not always the case.听

In the attempt to discover how much (if at all) having an affinity for music affects a person鈥檚 ability to recognize/process English vowels, I conducted an independent study wherein I asked a variety of individuals 鈥 both of musical and non-musical backgrounds 鈥 to listen to, identify, and then replicate three synthesized vowel sounds made to emulate real spoken vowels. Vowels are a lot like musical chords in that each consists of a complex combination of multiple different frequencies, or pitches, produced at the same time.听 Therefore, I started my research expecting to see a generally positive relationship between the extent of a person鈥檚 musicality and the accuracy with which they were able to perceive and reproduce the study sounds.听

Created using Praat software, the sounds I provided for my study participants to hear were combinations of 鈥減ure鈥 tones (i.e., sounds with a single frequency) layered on top of one another, resulting in highly digitalized-sounding composite tones when played all together. For example, to create a synthesized vowel /i/ (as in 鈥渂eet鈥), I combined 3 tones with measured frequencies of 280 Hz, 2207 Hz, and 2254 Hz. You can hear the resulting sound, as well as see a visual representation of it, in the recording and graph given below:听

Link to recording of Synthesized Vowel 1 (/i/): 听

Specturm

Spectrogram of Synthesized Vowel 1 (/i/)

The lines of red dots shown on the above spectrogram represent the individual 鈥減ure鈥 tone frequencies (also called 鈥渇ormants鈥 in natural speech) that when played simultaneously, produce a synthesized version of the correlating vowel (EDUHK, 2021, Pich茅, 1997, and 鈥淯sing Formants鈥, 2019). In addition to /i/, I also created synthesized versions of the vowels /o/ (as in 鈥渙rchard鈥) and /忙/ (as in 鈥渉at鈥).听 All were pretty simple to create and were intended to be relatively easy for an individual to distinguish from each other, regardless of whether they proved easy to identify.听

While 鈥渆asy鈥 might not be the best descriptor to use when characterizing the participants鈥 actual responses to the study activity, the overall feedback was nonetheless significant. Interestingly, out of all the individuals who participated, those who identified as 鈥渘on-musicians鈥 appeared the most likely to create each study sound as they heard it (i.e., their perceptions of the sounds matched their own reproductions of them).听 By a similar token, the participants whose perceptions didn鈥檛 match their productions often ended up adding more acoustic features to their sound productions than they described as hearing 鈥 e.g., giving the sound a more nasalized quality, or combining two sounds together. Many of these individuals seemed to do this unconsciously, perhaps further endorsing their affinity to decode sounds through a perspective centered around musicality as opposed to linguistic meaning.听

In terms of correctness of identification, out of the three synthesized vowels, /i/ was accurately identified the most often. It also had the fewest occasions of perceptions that did not align with productions. This may indicate that there is an inverse relationship between the number of accurate identifications of a synthesized vowel and the frequency at which it is perceived one way but produced another.听听

Upon considering all the above conclusions, it appears more likely that a person with a higher degree of musical background may experience more difficulty in attempting to identify a vowel from a set of raw, pure tone frequencies alone. This goes against my initial prediction in the sense that, even as musicians become quite skilled at interpreting tones as individual 鈥渕usical鈥 notes, this might get in the way of their ability to interpret the tones together as a spoken vowel sound.听

How, then, do we best treat language learners who may find it especially difficult to learn a language鈥檚 sound system due to their stronger inclination towards processing individual speech sounds in terms of perceived 鈥渕usical鈥 attributes rather than phonemic meaning? Using my initial research as a starting point, I hope to dive further into this question, as well as countless others that have arisen as a result, all with the intent of continuing to bridge the gap between language and music within the scope of education and beyond.


Header image credit:

Boersma, P. &听Weenink, D. (2016). Praat (Version 6.1.16). University of Amsterdam.

EdUHK. (2021). 2.2 Formants of vowels. Phonetics and phonology.

Pich茅, J. (Ed.). 1997. Table III: Formant values. The csound manual (version 3.48): A manual for the audio processing system and supporting programs with tutorials. Analog Devices Incorporated.

Using formants to synthesize vowel sounds. (2019, July 17). SoundBridge.