arXiv:1202.4212v1 [cs.SD] 20 Feb 2012
Harmony Explained:The Major Scale, The Standard Chord Dictionary, and The Difference of Feeling Between The Major and Minor Triads Explained from the First Principles of Physics and Computation; The Theory of Helmholtz Shown To Be Incomplete and The Theory of Terhardt and Some Others Considered
Progress Towards A Scientific Theory of Music
Daniel Shawcross Wilkerson
Begun 23 September 2006; this version 19 February 2012.
Abstract and Introduction
Most music theory books are like medieval medical textbooks: they contain unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases. How does music, in particular harmony, actually work, presented as a real, scientific theory of music?
In particular we derive from first principles of Physics and Computation the following three fundamental phenomena of music:
- the Major Scale,
- the Standard Chord Dictionary, and
- the difference in feeling between the Major and Minor Triads.
While the Major Scale has been independently derived before by others in a similar manner as we do here [Helmholtz1863, p. 300], [Birkhoff1933, p. 92], I believe the derivation of the Standard Chord Dictionary as well as the difference in feeling between the Major and Minor Triads to be an original contribution to science and art. Further, we think our observations should convert straightforwardly into an algorithm for classifying the basic aspects of tonal music in a manner similar to the way a human would.
Further, we examine the theory of the heretofore agreed-upon authority on this subject, 19th-century German Physicist Hermann Helmholtz [Helmholtz1863], and show that his theory, while making correct observations, and while qualifying as scientific, fails to actually explain the three observed phenomena listed above; Helmholtz isn't really wrong, he just fails to be really right, and considers only physical and not computational phenomena. We also consider the more recent and more computational theory of Terhardt [Terhardt1974-PCH] (and others) and show that, while his approach (and, it seems, that of others following in his thread) also attempts a computational explanation and derives some observations that seem to resemble some of those of the initial part of our analysis, we seem to go further.
I intend this article to be satisfying to scientists as an original contribution to science (as a set of testable conjectures that explain observed phenomena), yet I also intend it to be approachable by musicians and other curious members of the general public who may have long wondered at the curious properties of tonal music and been frustrated by the lack of satisfying, readable exposition on the subject. Therefore I have written in a deliberately plain and conversational style, avoiding unnecessarily formal language; Benjamin Franklin and Richard Feynman often wrote in a plain and conversational style, so if you don't like it, to quote Richard Feynman, "Don't bug me man!"
Table of Contents
- 1 The Problem of Music
- 1.1 Modern "Music Theory" Reads Like a Medieval Medical Textbook
- 1.2 What is a Satisfactory, Scientific Theory?
- 1.3 Music "Theory" is Not a Scientific Theory of Anything
- 1.4 Can we Make a Satisfactory Theory of Music?
- 1.5 Physical Science: Harmonics Everywhere
- 1.5.1 Timbre: Systematic Distortions from the Ideal Harmonic Series
- 1.6 Computational Science: as Fundamental as Physical Science
- 1.6.1 Algorithms are Universal
- 2 Living in a Computational Cartoon
- 2.1 Searching for Harmonics
- 2.1.1 Virtual Pitch: Hearing the Harmonic Series Even When it is Not There
- 2.1.2 Using Greatest Common Divisor as the Missing Fundamental
- 2.1.3 Even Animals Seem to Compute the Ideal Harmonic Series
- 2.2 Artifacts of Optimization
- 2.2.1 Relative Pitch: Differences Between Sounds
- 2.2.2 Octaves: Sounds Normalized to a Factor of Two
- 2.3 Harmony: Sweetness is the Ideal
- 2.3.1 Recreating an Ideal Harmonic Series using Instruments having Systematically-Distorted Timbre
- 2.3.2 Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes
- 2.3.3 Vertical Intervals Have Pure Ratios
- 2.3.4 Vertical Intervals Have Balanced Amplitudes
- 2.3.5 Vertical Intervals Are All The Same Ratio
- 2.3.6 Harmony is Sweeter Than Sweet
- 2.4 Interestingness: Just Enough Complexity
- 2.4.1 The Simplicity of Theme
- 2.4.2 The Complexity of Ambiguity
- 2.5 Recognition: Feature Vectors
- 2.5.1 Soft Computing
- 2.5.2 False Recognition
- 2.5.3 Cubism: Partial Recognition Due to Redundant, Over-Determined Feature Vectors
- 2.1 Searching for Harmonics
- 3 Harmonic Music Explained
- 3.1 The Major Triad
- 3.2 The Major Scale
- 3.2.1 Interlocking Triads
- 3.2.2 Using Logarithms to Visualize Distances Between Tones/Notes
- 3.2.3 The Keyboard Revealed
- 3.3 Scales and Keys
- 3.3.1 Changing Key: Playing Other Groups of Triads
- 3.3.2 Key Changes Break Harmony
- 3.3.3 Just versus Equal Tuning
- 3.4 The Minor
- 3.4.1 The Minor Triad
- 3.4.2 The Minor as Auditory Cubism
- 3.4.3 Minor Scales
- 3.5 Chords
- 3.5.1 The Standard Chord Dictionary
- 3.5.2 How to Turn Sweetness into Mud: Over-Using Octaves
- 3.5.3 Chords from the Harmonic Series
- 3.5.4 Chords Inducing Ambiguity
- 3.5.5 Chords Using the Minor Triad
- 3.5.6 Chords Preserving Intervals but not Harmonics
- 4 Miscellaneous Objections
- 4.1 But what about the Circle of Fifths!
- 4.1.1 Fifths make a Circle
- 4.1.2 The Circle of Fifths is Just a Combinatorial Coincidence
- 4.1.3 The Circle of Fifths Allows for Cool Chord Transitions
- 4.1.4 The Symmetries of the Circle of Fifths are a Terrible Red Herring
- 4.2 But Other Cultures Have Different Musical Scales!
- 4.2.1 A Culture May Simply not be Fully Exploiting All of the Universal Harmonic Features
- 4.2.2 But The Nasca People Of Peru Use A Linear, Not A Logarithmic, Scale!
- 4.3 But You Can Make a Piece of Music Based Entirely on That Utterly Un-Harmonic Interval, the Augmented Fourth!
- 4.4 But I've Been a Musician All My Life / Studied Music In College and I've Never Heard Any of This Before!
- 4.1 But what about the Circle of Fifths!
- 5 Helmholtz Fails to Fully Explain Harmony
- 5.1 Helmholtz's Theory Relies Only On Interfering Overtones, But Harmony Is Something More
- 5.2 Helmholtz's Theory Doesn't Imply Virtual Pitch
- 5.3 Helmholtz's Theory is that Pleasure is Only the Absence of Pain
- 5.3.1 Harmony is Rapture
- 5.4 Helmholtz's Theory Fails to Fully Explain the Qualitative Difference Between the Major and Minor Triads
- 5.5 Helmholtz Isn't Really Wrong, He Just Fails To Be Really Right
- 6 Other Modern Theories, such as Terhardt and 'Fusion or pattern matching' Theory
- 6.1 Terhardt Recognizes that the Brain is Listening For Something
- 6.2 Terhardt Does Not Explain Sustained and Minor Chords
- 7 Future Work: Towards A Unifying Theory of Music
- 7.1 Melody as Arpeggio
- 7.1.1 Scale As Theme: Melodic Association From Harmonic Association
- 7.1.2 Streaming: Multiple Similar Phenomenon Occurring Consecutively Are Explained By The Brain As One Thing Moving
- 7.1.3 Melody can Easily Create Interesting Ambiguities
- 7.2 The Role of Narrative Generally
- 7.3 Embodiment and Emotion
- 7.4 A Proposal For A Unifying Physical and Computational Theory of Music
- 7.1 Melody as Arpeggio
- 8 Acknowledgements
- 9 References
1 The Problem of Music
People push different keys on a piano; some combinations and patterns sound good; others do not. How does that work? Looking at a piano, it is laid out in the following pattern (w=white, b=black)
... wbwbw wbwbwbw wbwbw wbwbwbw ...
Hmm, the white and black keys mostly just alternate, yet these alternating regions last for 5 and then 7 keys and then that 5/7 region-pair repeats, and where these regions meet there are two adjacent white keys. There seems to be a pattern, but it is quite an odd one.
The piano keyboard seems really weird and ad-hoc.Doesn't it seem that something as simple as sound should have a simple device for producing it?
Further, this weirdness is not specific just to the piano: the key layout reflects the Major Scale [maj] which is the basis of all Western music. Is that black-white pattern somehow fundamental to sound and music itself? Or are they really just a cultural coincidence, combinations of sounds that we have heard over and over since infancy and been trained to associate with different emotions? Is something fundamental to the ear and to sound itself that is going on here or not?
1.1 Modern "Music Theory" Reads Like a Medieval Medical Textbook
These questions have bothered me literally for decades (starting when I was about ten, looking at our piano keyboard and asking "what?!"; I basically wrote the above Section 1 "The Problem of Music" at that time). Consulting "music theory" never helped me either, as
Reading a music theory book is like reading a medieval medical textbook: such books are full of unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases.
For example, here is the first page from a famous book on Jazz Theory, "Jazz Improvisation 1: Tonal and Rhythmic Principles" by John Mehegan [Mehegan1959]. Recall, this is the first page of Lesson 1 of Section 1 of Book 1, the very first thing the student reads!
"Each of the twelve scales is a frame forming the harmonic system."
What is a "scale"? Where do they come from? For what purpose are there or how does it emerge that there are twelve exactly? What is a "harmonic system" and what does it mean to say a scale "frames" it?
"Diatonic harmony moves in two directions: Horizontal and Vertical."
Really?! They both look pretty diagonal to me. Oh, but it's Diatonic! That sounds Latin so I guess these people are smart.
"By combining these two movements... we derive the scale-tone seventh chords in the key of C."
What is a "chord"? What is a "key"? WHAT THE HECK ARE THEY TALKING ABOUT!
You can't start a science textbook like that. You have to start with simple observations humans can make. You have to build up complex structures from simple ones. You have to motivate your distinctions.
Even if you say "A chord is 3 or more notes played together" that's also almost the definition of a "key" as well; for what purpose do we have this distinction? You could say "well the notes of a key are played together but not at the same time," but that also is true of an arpeggio-ed chord; again what's the distinction? Even if you say "a C major chord is C-E-G" there is no motivation as to how it is that C-E-G sound good together and other combinations of notes do not.
This "music theory" reminds me a bit of Richard Feynman's description of a science textbook he reviewed for the California school board as told in '"Surely You're Joking, Mr. Feynman!": Adventures of a Curious Character', [Feynman1985, p. 270-271], (emphasis in the original):
For example, there was a book that started out with four pictures: first there was a wind-up toy; then there was an automobile then there was a boy riding a bicycle; then there was something else. And underneath each picture it said, "What makes it go?"
I thought, "I know what it is: They're going to talk about mechanics, how the springs work inside the toy; about chemistry, how the engine of the automobile works; and biology, about how the muscles work."
It was the kind of thing my father would have talked about: "What makes it go? Everything goes because the sun is shining." And then we would have fun discussing it:
"No, the toy goes because the spring is wound up," I would say.
"How did the spring get wound up?" he would ask.
"I wound it up."
"And how did you get moving?"
"And food grows only because the sun is shining. So it's because the sun is shining that all these things are moving." That would get the concept across that motion is simply the transformation of the sun's power.
I turned the page. The answer was, for the wind-up toy, "Energy makes it go." And for the boy on the bicycle, "Energy makes it go." For everything, "Energy makes it go."
Now that doesn't mean anything. Suppose it's "Wakalixes." That's the general principle: "Wakalixes makes it go." There's no knowledge coming in. The child doesn't learn anything; it's just a word!
1.2 What is a Satisfactory, Scientific Theory?
Further, a scientific theory of something is expected to have a certain "explanatory power". But what is "explanatory power"? Is it just whatever we like? Consider the old explanations of disease; here is one: evil spirits inhabit you [dem]. Well, did anyone ever see these spirits? Were the experiences of these spirits universal across human kind? Where there some general rules of how the spirits behaved? How many there were? What would appease them?
Another theory was Humorism [hum]: that there were four different fluids in the body: blood, black bile, yellow bile, and phlegm; when they got out of balance, you had a disease. Ok, this is better than arbitrary spirits, but did anyone measure the relative levels of these fluids? Could someone predict sickness by observing these fluids get out of balance? Could you make someone better by, say, draining blood from them? "Treatment" based on this theory seem to have been long practiced, but did anyone measure to see if draining blood really made people better versus a control group that did not have their blood drained?
Now we have an new theory called modern medicine. It is much more complex, but let's take a subset of it: there are little creatures called bacteria that live everywhere. Certain kinds can live in your body and the results of their activity, such as their excretions, get your body out of normal working order, and thus you become sick. If you give chemicals to a person that are more toxic to the bacteria than the person, does the person get better? Yes [Mobley-antibiotics] ! Even when compared to a control group? Yes! Can we see these little bacteria in a microscope? Yes! Ok, this is much more satisfactory as a scientific theory.
Now, let us step back and consider what makes us more satisfied with this theory. What is going on such that it is a better theory?
- For one thing, the theory is mechanical: we have some mechanism, consistent with our understanding of inanimate matter today (physics and chemistry) such that the operation of the mechanism corresponds with what we observe (Scientific Method) [sci].
- Further, this mechanism is deterministic and precise: there isn't much arbitrariness in the mechanism: we can compute rather well how sick someone will get and how much toxin we have to give them to kill the bacteria and not the person.
- This mechanism is universal: there is no appeal to beliefs or cultural norms: people throughout the world get sick in the same way and the medicines work on them, with but small differences that can be further explained by another mechanism called genetics.
- This mechanical explanation is simple and minimal (Occam's razor) [occ]. We can see the parts working.
- Lastly, the mechanism is factored -- made up of independent parts -- and the complexity of the observed phenomena is emergent -- arising naturally from the operation of the parts. That is, these parts of the explanation of disease all operate independently: (1) how the body works such that the bacterial excretions disrupt it, (2) how bacteria works such that the toxin kills it, (3) how the toxicity to the human depends on the size of the human, etc.
Physicist Richard Feynman gave a series of lectures where he attempted to encapsulate the basic nature of how science is done and the kind of results it produces; these were published as "The Character of Physical Law" [Feynman1965]. Here is a brilliant paragraph on how to know when you have finally found the truth. [Feynman1965, p. 171] (underlining added, not in the original):
One of the most important things in this 'guess -- compute consequences -- compare with experiment' business is to know when you are right. It is possible to know when you are right way ahead of checking all the consequences. You can recognize truth by its beauty and simplicity. It is always easy when you have made a guess, and done two or three little calculations to make sure that it is not obviously wrong, to know that it is right. When you get it right, it is obvious that it is right -- at least if you have any experience -- because usually what happens is that more comes out than goes in. Your guess is, in fact, that something is very simple. If you cannot see immediately that it is wrong, and it is simpler than it was before, then it is right. The inexperienced, and crackpots, and people like that, make guesses that are simple, but you can immediately see that they are wrong, so that does not count. Others, the inexperienced students, make guesses that are very complicated and it sort of looks as if it is all right, but I know it is not true because the truth always turns out to be simpler than you thought.
Using Computer Science terminology, I summarize Feynman's point as follows.
The more factored a theory and the more emergent the observed phenomena from the theory, the more satisfying the theory.
The Ptolemaic [ptol] model of the solar system puts the earth at the center. This explanation really does explain the movements, especially when epicycles [epi] are added, but it is rather complex and ad hoc: how does it emerge that we need epicycles? The Copernican [cop] system is also another explanation of the solar system that puts the sun at the center. This second explanation only requires Newton's laws of motion plus gravity. The consequences of Newton's laws are complex and even hard to simulate, even on a modern computer, but the laws themselves are quite simple and independent and mechanical and factored and observable etc. Even further, the notation used in this theory easily reflects the underlying understanding in the theory: it allows for easy calculations when making predictions of the theory. All in all, the Copernican system is quite a quite satisfying explanation, or theory, of the motions of planets in the solar system because, not only does it explain the observed phenomena, it is factored into simple parts and the observed phenomena are emergent from the interactions of those parts. Consequently, we use the Copernican system today (adjusted for relativity and other more recent observations).
1.3 Music "Theory" is Not a Scientific Theory of Anything
Music "theory" as we find in books today contains none of the properties of a modern theory that we find satisfying. At the start we are presented the odd white-black-white-WHITE-black keyboard or Major Scale as a given. We are sometimes told for example that the Major Scale comes from the Ancient Greeks. We are sometimes told it is arbitrary and it only sounds good because we have heard it since childhood.
Nothing in music "theory" counts as a scientific theory of anything.
We are told that certain combinations of notes sound good; these combinations are called "chords" and the fact that these combinations sound good is also arbitrary. We are told lots of strange names for intervals between notes and these names make no sense. The Standard Chord Dictionary of common chords simply consists of a list of note combinations we are told are good to play together and will feel a certain way when heard. Nowhere is there any notion of how we would predict the feeling each chord engenders from the construction of the chord.
Sometimes I have encountered vague explanations offering "pairs of notes having low whole-number ratios" as the reason some notes sound good together and then told no one really knows how that works. In Section 5 "Helmholtz Fails to Fully Explain Harmony" we address a well-known theory of Helmholtz where he attempts an explanation of how it is that notes with frequencies that are in low whole-number ratios to one another should sound good together. We will show that his theory has problems.
If we make any attempt to actually compute note ratios, the notation actually gets in the way of our understanding: The notation for the notes and their distances really does not convey very well the actual ratios of the notes. For example, in the Major Scale, sometimes going up to the next one (space to line above it or line to space above it) goes up one whole "step", a ratio of 2^(1/6) = 1.122 (the sixth root of 2), and sometimes only a "half-step" (or "semi-tone"), a ratio of half as much 2^(1/12) = 1.059 (the twelfth root of 2). (For more on logarithms and exponentials, see Section 3.2.2 "Using Logarithms to Visualize Distances Between Tones/Notes".) (To those unfamiliar with musical notation, we will explain the numbers later.) The difference between these whole and half steps can only be discerned by looking way over to the left of the page of music and doing complex computations with sharps and flats in order to compute the "key" of the music; and that whole process is designed to defeat the sometimes-half/sometimes-whole steps (for the arbitrary key of C) that is baked into the notation itself. This notation may make music easy to play, but it does not make it easy to understand.
This music "theory" has all the properties of preventing understanding, not promoting it. It fits the description of pseudo-science pretty well. Let's try to do better.
1.4 Can we Make a Satisfactory Theory of Music?
I simply refuse to believe that something so fundamental to human life and so satisfying to so many people is so arbitrary and so un-explainable. I have attempted to come up with something better and I think I have succeeded.
As we build up this theory, we want to make sure that we make as few assumptions as possible, and that these assumptions are founded upon actual experimentally-derived facts -- just as we now demand of the rest of science. In particular we would like a real, scientific theory of music to be universal and not appeal to cultural relativism that says "it's all just arbitrary"; no explanation that says such things is a real scientific theory of anything.
- sound and instruments exist in reality and
- music only sounds like something because a human brain is computing the listening to it,
- physics and
The brain is central to our theory. Not knowing how the brain really works, we therefore have a hole to fill in our explanation. We proceed by telling a story to explain the known properties of music; along the way we assume certain conjectures about the structure of the brain where we need them. We make these conjectures as reasonable as possible, given the assumption that
The brain is a machine optimized by evolution to compute human survival.
That is, being a machine, the brain is likely to be subject to properties that computer scientists and engineers have observed across many computational systems and that these properties will be driven by evolutionary optimization. In the end, the test of our theory will depend on (1) how well it explains the observed phenomenon called music, and (2) how well the conjectures hold up under testing. In this essay we do (1) and we leave (2) for future work by cognitive/brain scientists.
1.5 Physical Science: Harmonics Everywhere
Physical science is about as rock-solid of a theory of the world as anything. This is a good place to start. Catherine Schmidt-Jones [Schmidt-waves]:
For the purposes of understanding music theory, however, the important thing about standing waves in winds is this: the harmonic series they produce is essentially the same as the harmonic series on a string. In other words, the second harmonic is still half the length of the fundamental, the third harmonic is one third the length, and so on.
We can either compute or observe (using, say, high-speed cameras) the properties of the stable vibrations that occur when a string or or a column of air is excited:
- There is one frequency (the "fundamental") at which the string or air will vibrate;
- there are also other vibrations (the "harmonics" or "overtones") having higher frequencies that are multiples of 2, 3, 4, 5, 6, 7 etc. times the fundamental at which the string or air will also vibrate.
These harmonics can be demonstrated by two people hold a long jump-rope: (1) If they swing the rope slowly, the whole rope makes a single wave. (2) However if they go twice as fast and out of phase (one goes up while the other goes down) then half of the rope will be up and the other half down and the positions of up and down will switch twice as fast; further the very middle of the rope will not move at all (a "node"). (3) A similar effect happens with three waves if they go even faster. For a picture, see [Schmidt-waves, Figure 2]. When a string is plucked, all of these waves are happening at the same time. That is, plucking generates all waves, but only those the frequency of which divides the length of the string will bounce back and forth and re-enforce each other and persist; other frequencies will die out. From [Schmidt-waves]:
In order to get the necessary constant reinforcement, the container has to be the perfect size (length) for a certain wavelength, so that waves bouncing back or being produced at each end reinforce each other, instead of interfering with each other and cancelling each other out. And it really helps to keep the container very narrow, so that you don't have to worry about waves bouncing off the sides and complicating things. So you have a bunch of regularly-spaced waves that are trapped, bouncing back and forth in a container that fits their wavelength perfectly. If you could watch these waves, it would not even look as if they are traveling back and forth. Instead, waves would seem to be appearing and disappearing regularly at exactly the same spots, so these trapped waves are called standing waves.
We will call each single sine-wave at a single frequency a "tone", whereas the collection of frequencies that occur together due to a single physical process (such as a vocal utterance or the striking of a piano key) we will call a "note". (A tone can be expressed simply as (1) a wave "frequency" in Hertz (Hz), the number of cycles per second, (2) a wave "amplitude", the wave peak height, and (3) a wave "phase", where the wave is in its cycle compared to other waves; we won't discuss amplitude and phase much.)
This sequence of tones forming a note is called the "Harmonic Series" [har] or "Overtone Series" of the fundamental. Herein we speak of "the (ideal) Harmonic Series" when we mean an abstract computational ideal and speak of "an overtone series" when we mean what is actually produced in reality by a particular actual instrument (which may be quite different from the ideal); note that others quoted here may not follow this same convention. (Further, throughout we pluralize "series" as "series-es" because in a technical discussion it is very important to avoid the ambiguity between a single series of multiple tones and multiple series-es of multiple tones.)
There are two conventions for numbering overtones/harmonics; we use the convention where the fundamental or "Root" tone is called "harmonic 1", the tone vibrating twice as fast is called "harmonic 2", the tone vibrating three times as fast is called "harmonic 3", etc.
1.5.1 Timbre: Systematic Distortions from the Ideal Harmonic SeriesFrom "This is Your Brain on Music" by Daniel J. Levitin [Levitin2006, p. 43-44]:
The timbre of a sound is the principal feature that distinguishes the grow of a lion form the purr of a cat, the crack of thunder from the crash of ocean waves,.... Timbral discrimination is so acute in humans that most of us can recognize hundreds of different voices. We can even tell whether someone close to us -- our mother, our spouse -- is happy or sad, healthy or coming down with a cold, based on the timber of that voice.
Timbre is a consequence of the overtones.... When you hear a saxophone playing a tone with a fundamental frequency of 220 Hz, you are actually hearing many tones, not just one. The other tones you hear are integer multiples of of the fundamental: 440, 660, 880, 1200, 1420, 1640, etc. The different tones -- the overtones -- have different intensities, and so we hear them as having different loudnesses. The particular pattern of loudnesses for these tones is distinctive of the saxophone, and they are what give rise to its unique tonal color, its unique sound -- its timbre. A violin playing the same written note (220 Hz) will have overtones at the same frequencies, but the pattern of how loud each one is with respectively to the others will be different. Indeed, for each instrument, there exists a unique pattern of overtones. For one instrument, the second overtone might be louder than in another, while the fifth overtone might be softer. Virtually all of the tonal variation we hear -- the quality that gives a trumpet its trumpetiness and that gives a piano its pianoness -- comes from the unique way in which the loudnesses of the overtones are distributed.
Each instrument has its own overtone profile, which is like a fingerprint. It is a complicated pattern that we can use to identify the instrument. Clarinets, for example, are characterized by having relatively high amounts of energy in the odd harmonics -- three times, five times, and seven times the multiples of the fundamental frequency, etc. (This is a consequence of their being a tube that is closed at one end and open at the other.) Trumpets are characterized by having relatively even amounts of energy in both the odd and the even harmonics (like the clarinet, the trumpet is also close at one end and open at the other, but the mouthpiece and bell are designed to smooth out the harmonic series). A violin that is bowed in the center will yield mostly odd harmonics and accordingly can sound similar to a clarinet. But bowing one third of the way down the instrument emphasizes the third harmonic and its multiples: the sixth, the ninth, the twelfth, etc.
Besides introducing us to timbre, Levitin points out:
Most real instruments systematically produce tones having amplitudes distinct from that of the ideal Harmonic Series.
Michael O'Donnell points out that the effects of timbre on the overtone series goes even further [O'Donnell, 14 January 2009]:
I suggest that you check into the importance of approximate harmonic series. E.g., the overtones on a piano string are measurably and audibly higher in frequency than the harmonics that they approximate. Both the nearness to harmonics, and the perceptible difference, appear to be important....
You mentioned the way that the harmonic series of frequencies occurs naturally in air columns, as in strings. But, on soft strings (such as guitar, violin---little resistance to bending) the natural series of resonant frequencies is very accurately harmonic. In wind instruments, the natural resonances of the air column approximate the harmonic series rather poorly. In the brass, the approximation is so poor that the numbers of the harmonics don't even match between the natural resonances and the notes as played. While the conical shape of many reeds is designed to improve the harmonicity of the resonances, the bell on the brass is actually designed to increase the inharmonicity of the natural resonances, which produces a better match in the misaligned overtones. It is phase locking between vibrational modes, caused by the highly nonlinear feedback in the excitation mechanisms (reeds, lips, bow scraping) that makes the overtone series so accurately harmonic, not the natural resonances.
That is, O'Donnell points out:
Most real instruments systematically produce tones having frequencies distinct from that of the ideal Harmonic Series.
Therefore whatever our theory of harmony it should work for sounds where the overtone series differs from the ideal Harmonic Series by (1) altered amplitudes and (2) altered frequencies. However, notice that both of these distortions of the ideal Harmonic Series have one important property:
The distortions made by the overtone series of a given instrument to the ideal Harmonic Series are a predictable, systematic function of the instrument kind.
That is, two notes (series-es of overtones) made by the same (kind of) instrument will be distorted from the ideal Harmonic Series in the same (or similar) way. This must be the case in order for an instrument or instrument kind to have a uniform, recognizable timbre. We will use this below.
1.6 Computational Science: as Fundamental as Physical Science
I think part of the reason the theory we develop here might not have been described before is that there aren't many people who think about both the physical and the computational understanding needed to derive it.
The properties, or laws, of computation are just as fundamental as the physical laws.
Computation is everywhere -- you live in a sea of it.
- You may see a cup, but computational engineers see an idiom for managing liquids by getting them stuck in a local optimum.
- You may think of ownership as a basic human right, but engineers think of it as an distributed decision-making algorithm.
- You may enjoy a field full of bumblebees pollinating flowers, but engineers enjoy it as information distribution network.
- You may think it is polite to not talk on top of other people at dinner, but engineers think it is optimal to use a back-off algorithm to resolve a network packet collision.
I wrote that list off of the top of my head as fast as I can type and edit text: the examples are myriad.
Consider for a moment that perhaps you are computation: that you are the computational activity of your brain. Some people say that this reduces the wonder of life to simple mechanism; I say it simply elevates mechanism to the wonder of life. While you need not adopt this All-Is-Computation point of view as your personal understanding of life or of yourself, a computational understanding of the brain has amazing explanatory power, so please consider it at least for the rest of this essay.
1.6.1 Algorithms are Universal
Finding good ways to solve a problem with less resources is a basic pursuit of those who study computation. A general method for solving a problem is called an "algorithm"[alg]. New algorithms that solve common problems well are rare and highly valued. When a solution is "reduced to the simplest and most significant form possible without loss of generality" we say it is "canonical" [canon]. An algorithm is a canonical method.
Many tricks in engineering seem not to be merely the artifacts of human cleverness, but instead the result of fundamental properties of the medium of computing. Algorithms invented by different species to solve the problem called staying alive often resemble each other in ways that cannot be explained by any other means than "that's the only way to do it" (or one of only a few ways). From [cutt]:
The organogenesis of cephalopod eyes differs fundamentally from that of vertebrates like humans. Superficial similarities between cephalopod and vertebrate eyes are thought to be examples of convergent evolution.
The human eye and the cuttlefish eye both address the problem of extracting information at a distance from light. Both evolved separately and yet they both end up at a very similar solution. Biologists call this phenomenon "convergent evolution" [conv]; architects call it "timeless pattern" [Alexander1979]; storytellers call it "archetype" [archetype]; clothiers call it "classical style"; computer scientists call it "algorithm". When humans tried to find a mechanical solution to the same problem, they invented the camera which is just an eye again. We should therefore not be surprised if
Conjecture One: Computational laws/idioms/patterns/algorithms are universal: The brain works using a combination of simple computational algorithms of which we are likely already aware.
2 Living in a Computational Cartoon
"I'm not bad, I'm just drawn that way." -- Jessica Rabbit [Jessica-bad]
Jessica Rabbit [Jessica-pout] is one of the sexiest characters in Hollywood, elected 88th of The 100 Greatest Movie Characters of All Time by Empire Magazine [Jessica-great]. Sadly, she is just a drawing and a voice. Despite the powerful illusion to the contrary, we do not see or hear the world; we see and hear the world that our brains compute. Like the characters in "Who Framed Roger Rabbit?" [WFRR-1988], we live in a cartoon. Music is not what the world does; it is what we do with the world.
A friend of mine Joel Auslander used to intern at Pixar; his job was to make physics simulator tools for the animators. He wanted to make simulators that were accurate to the real physics, but he said that the animators told him that people don't want to watch real physics, people want to watch cartoon physics: even though not accurate as real physics, cartoon physics is somehow more satisfying [Auslander, c. 1996].
Conjecture Two: The brain uses cartoon physics, that is, physics that is easy to compute, but not necessarily faithfully accurate to reality.
We suggest that both the use of cartoon physics and the inaccuracy of cartoon physics are due to the simple fact that the brain is computationally limited.
Here is a cartoon physics effect in vision. When taking a drawing class our teacher pointed out some useful visual effects to us: (1) To make an object look round, shade the object the more its face bends away from the viewer and (2) put highlights where the light source would reflect off of it. Now think what pantyhose do to women's legs. (1) When the mesh of the hose is straight on, it is not very dark, but as the leg bends away and the mesh is seen on edge, the threads line up and the grid rapidly appears to darken. (2) Pantyhose are shiny and so naturally produces reflection highlights. That is, pantyhose fire the recognizers in your brain for the features of roundness harder than a real round leg could: her leg looks rounder than round, impossibly round. See Section 2.5 "Recognition: Feature Vectors" for more on this phenomenon.
We suggest that the brain is using cartoon physics when processing sounds as well. That is, explanations of auditory effects based on the physical properties of actual overtones of different instruments (such as the piano or the trumpet) are beside the point (or at least beside the primary point) when it comes to the brain. As we will see in Section 5 "Helmholtz Fails to Fully Explain Harmony", this point of view is the essential point where our theory differs from that of Helmholtz. What primarily distinguishes this essay from previous attempts to explain music is that our whole approach is oriented primarily not from the external world of physics, but from the internal world of the computation by our brains that is us, from the computational cartoon in which we live and from which we think we experience the world, but which is not the world, but instead only ourselves.
2.1 Searching for Harmonics
As Levitin pointed out in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", finding the difference between what we hear and the ideal Harmonic Series is a valuable tool for recognizing people and determining their emotional state. Many sounds are made by vibrating strings or columns of air, but perhaps more importantly, the human voice is made up of vibrating "chords" and a "windpipe" of air. Given that sounds associated to a single source would tend to be arranged in a Harmonic Series, and especially given how important the voice is to humans, it would not be surprising if perhaps
Conjecture Three: Finding harmonics is a common and important problem, so the brain has hardware for recognizing the Harmonic Series.
You can hear a demonstration of this, and of many other interesting auditory phenomena, on from the "Auditory Demonstrations" CD from the Institute for Perception Research, Eindhoven, The Netherlands and the Acoustical Society of America [acoustical-demo, Demo 1], "Cancelled Harmonics":
[Twenty tones in the same Harmonic Series are all played together.] When the relative amplitudes of all 20 harmonics remain steady (even if the total intensity changes), we tend to hear them holistically. However, when one of the harmonics is turned off and on, it stands out clearly. The same is true if one of the harmonics is given a "vibrato" (i.e. its frequency, its amplitude, or its phase is modulate at a slow rate).
I recall my voice teacher Andrea Fultz saying the goal was to get me to sing so that my voice resonated in my "mix": in both my head and chest voice at the same time [Fultz, c. 2006]. She was trying to get me to have a more ringing or sweeter voice by making sure all the overtones were present by ensuring that somewhere in my body some resonator of the right size was amplifying it (see Section 2.3 "Harmony: Sweetness is the Ideal" below).
2.1.1 Virtual Pitch: Hearing the Harmonic Series Even When it is Not There
There is reliable acoustic phenomenon called "Virtual Pitch": if the Harmonic Series is processed to remove the Root or Fundamental tone and then played to a person, that person will hear the note, including the Root tone, even thought it is not played [miss-fund]. The "Auditory Demonstrations" CD again [acoustical-demo, Demo 20], "Virtual pitch":
A complex tone consisting of 10 harmonics of 200 Hz having equal amplitude is presented, first with all harmonics, then without the fundamental, then without the two lowest harmonics, etc. Low-frequency noise (300-Hz lowpass, -10dB) is included to mask a 200-Hz difference tone that might be generated due to distortion in playback equipment.
As they say, in the demo overtones are subtracted one at a time, from the fundamental on up. Amazingly, the note being played seems to stay the same; however it does get more buzzy or annoying to the point where a fellow listener Simon Goldsmith thought that he would no longer call the last example the same note [Goldsmith, c. 2010].
Virtual pitch is what allows engineers to fake bass notes on small speakers: they don't play the low tones, as often the speaker is too physically small to make the fundamental frequency anyway; instead they play the overtones and rely on your brain to reconstruct the whole Harmonic Series. However, as we noted above, you will hear that small, cheap speakers sound, well, cheap or "tinny"; the bass just doesn't sound as good as it does when played on sub-woofers. That said, don't forget how remarkable it is that you can still "hear" the non-existent fundamental tone at all (which helpfully prevents the need for people to jog with sub-woofers attached to their ears). From [miss-fund]:
For example, when a note (that is not a pure tone) has a pitch of 100 Hz, it will consist of frequency components that are integer multiples of that value (e.g. 100, 200, 300, 400, 500.... Hz). However, smaller loudspeakers may not produce low frequencies, and so in our example, the 100 Hz component may be missing. Nevertheless, a pitch corresponding to the fundamental may still be heard.
(Note that virtual pitch is a special case of (1) the feature vector understanding that we give in Section 2.5 "Recognition: Feature Vectors" and (2) the concomitant effect of false recognition that we speak of in Section 2.5.2 "False Recognition", where here virtual pitch is the false recognition of the Harmonic Series.)
(See Section 6.2 "Terhardt Does Not Explain Sustained and Minor Chords" for an illustration by Coren [Coren1972] (as quoted by Terhardt [Terhardt1974-PCH]) which shows standard visual illusions as a metaphor with virtual pitch.)
(In "How to Play From a Fake Book" [Neely1999] says that when playing a chord, you can drop not only the Root of the chord, but also the Fifth and the listener will still hear the chord; see Section 3.5.4 "Chords Inducing Ambiguity". We should point out that here we speak of omitting one note from a chord, a collection of multiple notes, or multiple series-es of tones, whereas virtual pitch is a phenomenon of omitting one tone from a single Harmonic Series of tones of a single note. However we argue later in Section 2.3.2 "Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes" that these two situations are closely related and therefore the fact that it works to omit the Root or Fifth of a chord is actually the phenomenon of virtual pitch again and is thus more evidence for our theory that the brain is listening for the Harmonic Series.)
2.1.2 Using Greatest Common Divisor as the Missing Fundamental
What is the means by which the brain determines the missing fundamental? From [acoustical-demo, Demo 21], "Shift of Virtual Pitch":
A tone having strong partials with frequencies of 800, 1000, and 1200 Hz will have a virtual pitch corresponding to the 200 Hz missing fundamental, as in Demonstration 20. If each of these partials is shifted upward by 20 Hz, however, they are no longer exact harmonics of any fundamental frequency around 200 Hz. The auditory system will accept them as being "nearly harmonic" and identify a virtual pitch slightly above 200 Hz (approximately 1/3 * (820/4 + 1020/5 + 1220/6) = 204 Hz in this case). The auditory system appears to search for a "nearly common factor" in the frequencies of the partials.
There is a simple algorithm for finding the Root of a partial overtone series:
Given a set of tones, hear the (approximate) Greatest Common Divisor (gcd) of the tones as the fundamental.
2.1.3 Even Animals Seem to Compute the Ideal Harmonic Series
This conjecture on the brain creating virtual pitch seems to hold even for non-humans, as pointed out in "This is Your Brain on Music" by Daniel J. Levitin [Levitin2006, p. 41] (emphasis in the original):
When I was in graduate school, my advisor, Mike Posner, told me about the work of a graduate student in biology, Petr Janata.... Peter [sic] placed electrodes in the inferior colliculus of the barn owl, part of its auditory system. Then, he played the owls a version of Strauss's "The Blue Danube Waltz" made up of tones [by "tones" here he means what we are calling "notes": each note is an entire series of overtones] from which the fundamental frequency [what we are calling the fundamental tone of the overtone series] had been removed. Petr hypothesized that if the missing fundamental is restored at the early levels of auditory processing, neurons in the owl's inferior colliculus should fire at the rate of the missing fundamental. This was exactly what he found. And because the electrodes put out a small electrical signal with each firing -- and because the firing rate is the same as a frequency of firing -- Petr sent the output of these electrodes to a small amplifier, and played back the sound of the owl's neurons through a loudspeaker. What he heard was astonishing; the melody of "The Blue Danube Waltz" sang clearly from the loudspeakers: ba da da da da, deet deet, deet deet. We were hearing the firing rates of the neurons and they were identical to the frequency of the missing fundamental. The harmonic series has an instantiation not just in the early levels of auditory processing, but in a completely different species.
Michael O'Donnell pointed out to me that there is an ambiguity here [O'Donnell, 14 February 2009]:
[The above story] doesn't allow one to distinguish whether the Owl, or the human listener, is experiencing the virtual pitch.
I passed this on to Daniel J. Levitin; his response [Levitin, 24 May 2010]:
You're absolutely right that these two possibilities need to be distinguished. The electrodes that were placed in the brain of the owl (in the inferior colliculus) were analyzed using specotrograms[sic] and fourier[sic] analysis. It was clear that the signal itself coming from the owl's brain had replaced the missing fudnamental[sic]. It was only after this analysis that Petr thought to hook it all up to play the signal over loudspeakers (so that humans could hear the output) as a cool demonstration.
Female Mosquitoes only mate when rate of the wing-beats of the male harmonize at a Perfect Fifth above the rate of her wing-beats (we start introducing musical terminology such as the Perfect Fifth in Section 3.1 "The Major Triad"). From "Mosquitoes make sweet love music" [Mosquito-harmony]:
The familiar buzz of a flying female mosquito may be irritating to humans, but for her male counterpart, it is an irresistible mating signal. Males and females each have their own characteristic flight tone - which they create by beating their wings.
But when scientists from Cornell University listened in on a male Aedes aegypti pursuing his mate, they were surprised to hear a new kind of "music" playing....
The amorous couple began to beat their wings together at a matching frequency - 1,200 hertz. This love song is a "harmonic", or multiple, of their individual frequencies - 400 Hz for the female and 600 Hz for the male....
"So we're trying to discover what makes a male more attractive. It's a mystery. It could be his odour[sic], or his bright black and white markings.
"But we think females are assessing the fitness of males based on how well they can sing."
2.2 Artifacts of Optimization
The brain has constrained resources. Evolution has no time to waste and therefore these resources are likely used in an optimal way -- or at the very least any easy optimizations will have been done for a given organization of a brain. (That is, evolution will drive a machine into a local optimum, even if it gets stuck there and does not reach a global optimum.)
Having separate hardware in the brain for recognizing each combination of tones that co-occur in nature is sub-optimal and it would just be an expensive way to use up neurons. The algorithm every engineer resorts to in this situation, and what I suspect the brain does also, is to find a way to "re-use code": to solve the problem by generalizing the hardware a little so the same "code" can be used in many more situations. Here, we want one Harmonic Series recognizer that works for all the different overtone series-es we may encounter.
Further, the problem that the brain is solving when listening to music is recognizing sounds that are important to it, such as perhaps the nuances of a human voice against a background of noise. In order to recognize something, it is ok to simplify the input or throw away information if it makes the problem easier, as long as enough information is retained to complete the task.
We now consider two different tricks for greatly simplifying the computation the brain must do in order to recognize the harmonic series. We will also conjecture some computational artifacts of the way the brain computes that should result from these optimizations, resulting in well-known universal features of music: relative pitch and octaves.
2.2.1 Relative Pitch: Differences Between Sounds
Again, most engineers would tell you that, given the problem of designing a brain to recognize the Harmonic Series, their intuition would tell them to build one, single Harmonic Series recognizer, not a different one for every possible note. The way to accomplish this would be to make the machine recognize only that which is the same (or mostly the same) in all overtone series-es and ignore that which changes. While the tones of different Harmonic Series-es differ, conveniently the ratio of their frequencies to their fundamental frequency does not. Therefore we consider it very likely that
Conjecture Four: The brain normalizes tones by dividing tones to get tone ratios.
Recognizing ratios of tones (and notes) more strongly than the absolute tones themselves is a phenomenon called "Relative Pitch" [rel]. A ratio of a pair of tones (or notes) is called an "interval".
2.2.2 Octaves: Sounds Normalized to a Factor of Two
Processing sound requires operating on frequencies over several orders of magnitude. If these frequencies could be made to "wrap-around" then we have another opportunity for code re-use.
When the police take a mug shot of a criminal, their goal is to take the photo in such a way as to maximize the recognizability of the subject in the future given the photo. They employ a common trick used in the recognition problem: they photograph the subject in standard positions (front and profile), under standard lighting conditions, against a standard backdrop, and after removing any obscuring clothing. We say they normalize the photograph: they remove information irrelevant to the thing to be recognized and put it in a standard form; doing this helps recognize the thing later.
Consider the conceptually straightforward process of the brain halving or doubling the frequency of a wave until it is within a particular range. Now the brain only needs a Harmonic Series recognizer for tones within a frequency range of a single factor of two, not across the whole spectrum of sound. Breaking the problem into two parts like this, (1) normalization followed by (2) recognition, greatly simplifies the resulting frequency recognizer. We therefore consider it likely that
Conjecture Five: The brain normalizes tones by halving or doubling them until within a particular frequency range spanned by a factor of two.
The individual computational units of the brain are not as fast as those in modern electronics, however those of the brain are operating in "massive parallel": many operations may be computed at once and all that is needed is that one find the answer. To the intuition of anyone who has seen hardware designed it seems very likely that the brain is halving/doubling frequencies by many different powers of two in parallel and then running all of the results through the frequency recognizer at once. If any one matches, the harmonic has been found.
If this were so, then tones (and notes) that differ from each other by a factor of two would sound very much alike. The range of notes that are all within one factor of two is called in music an "Octave" [oct]. ("Oct" is Latin for eight, not two; the relationship to the number eight will become clear later.) Levitin again from "This is Your Brain on Music" [Levitin2006, p. 29]:
Here is a fundamental quality of music. Note names repeat because of a perceptual phenomenon that corresponds to the doubling and halving of frequencies. When we double or halve a frequency, we end up with a note that sounds remarkably similar to the one we started out with. This relationship, a frequency ratio of 2:1 or 1:2, is called the octave. It is so important that, in spite of the large differences that exist between musical cultures -- between Indian, Balinese, European, Middle Eastern, Chinese, and so on -- every culture we know of has the octave as the basis for its music, even if it has little else in common with other musical traditions.
Again, according to Levitin, the Octave interval occurs in every musical tradition in the world. This observation is the first of many to suggest that the musicality of sound depends on something universal about human beings, rather than simply being learned from culture.
2.3 Harmony: Sweetness is the Ideal
Recall from Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" that the brain uses differences from the ideal/cartoon model as a kind of or "personality" or in this case "timbre". Recall from the same section that Levitin suggests that we use this timbre to solve the important problem of recognizing people and their emotional state. But being perfect makes this recognition hard; from "What Caricatures Can Teach Us About Facial Recognition" [Austen-caricature] (see Section 2.5.2 "False Recognition" for more):
[W]hen you talk to these artists about their process, you realize that the psychologists have gotten the basics down pretty well. When Court Jones, the 2005 Golden Nosey winner, describes how he teaches the craft to younger artists, he lays out exactly the algorithm that vision scientists believe humans use to identify faces. Students, he says, should imagine a generic face and then notice how the subject deviates from it: "That's what you can judge all other faces off of."
Also, just as a vision scientist would predict, symmetrical faces -- those close to our internal average -- are especially difficult to caricature. People at the convention mention struggles with Katy Perry and Brad Pitt; the animator Bill Plympton, a guest speaker at the convention, tells me that Michael Caine has long been a bête noire. The same principle explains why the person at the convention with maybe the least symmetrical of faces appears by week's end in no fewer than 33 works of art on the ballroom walls.
I don't think I need a citation to claim that Katy Perry and Brad Pitt are considered to be very beautiful people. This suggests another conjecture.
Conjecture Six: Absence of distortion (or personality or timbre) is sweetness.
2.3.1 Recreating an Ideal Harmonic Series using Instruments having Systematically-Distorted Timbre
In Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" above we saw that the overtone series of a single instrument is easily distorted by myriad physical effects. However, recall that for the same (kind of) instrument, those distortions were systematic and reliable. Therefore by playing
- multiple notes,
- on instruments having the same (or similar) timbre,
- and relying on Relative Pitch to subtract the differences for us,
from distorted overtone series-es we can magically recreate parts of the ideal Harmonic Series!
Suppose we play two notes on the piano that are a Fifth (a factor of 3/2) apart. Per O'Donnell's comment in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series" above, since piano strings are not the strings of ideal physics, they don't make an ideal Harmonic Series. Instead, each tone in the series is moved by being multiplied by some fudge factor. However notice that strings on the piano are made of the same stuff, at least nearby strings, and this fudge factor should therefore be somewhat consistent across strings. That is, two corresponding tones at the same point in the overtone series of two different notes should get multiplied by the same fudge.Tones of 1st note: 1 ---> (1 * 2 * fudge2) ---> (1 * 3 * fudge3) ... - ---------------- ---------------- | | | | | | v v v --- ---------------- ---------------- Tones of 2nd note: 3/2 ---> (3/2 * 2 * fudge2) ---> (3/2 * 3 * fudge3) ...
Now notice that there are two kinds of intervals of tone pairs:
- "horizontal": intervals made by pairs of tones within the one series of tones generated by one note, and
- "vertical": intervals made by pairs of tones across the two series-es of tones generated by the two different notes, especially those of corresponding overtones.
2.3.3 Vertical Intervals Have Pure Ratios
As O'Donnell points out above in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", real instruments can systematically produce overtones at frequencies different from those of the ideal Harmonic Series; one such instrument is the piano which produces stretched overtones. However, these distortions from the ideal Harmonic Series affect these horizontal and vertical intervals differently:
- Horizontal intervals are fudged: the ratio of overtone 3 of the 2nd note to overtone 1 of the 2nd note has fudge in it:
- Vertical intervals are pure: the ratio of overtone 3 of the 2nd note to overtone 3 of the 1st note is pure:
However, I would be remiss if I did not point out here [acoustical-demo, Demo 31], "Tones and Tuning with Stretched Partials" from "Auditory Demonstrations" CD, quoted in Section 5.1 "Helmholtz's Theory Relies Only On Interfering Overtones, But Harmony Is Something More". In Demo 31, a piece by Bach is played on computer-generated piano (part 1) having normal overtones and (part 4) having overtones where an Octave is stretched from a factor of 2 to a factor of 2.1. Taken naively, our theory that the purity of vertical intervals matters to the brain suggests that these should both harmonize; however the normal one (part 1) certainly sounds better. We suggest therefore that if the horizontal intervals are distorted grossly enough, then the fact that the vertical intervals are pure cannot save the harmony from being destroyed by the dissonance of the horizontal intervals.
2.3.4 Vertical Intervals Have Balanced Amplitudes
As Levitin points out above in Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series", real instruments can systematically produce overtones at amplitudes different from those of the ideal Harmonic Series; one such instrument is the clarinet which emphasizes the odd overtones. Again however, these distortions of the ideal Harmonic Series affect these horizontal and vertical intervals differently:
- Horizontal intervals are sometimes made by a pair of tones having unbalanced amplitudes: for example, with the clarinet the ratio of an odd overtone to an even overtone will be an interval between a loud tone and a soft tone.
- Vertical intervals are always made by a pair of tones having balanced amplitudes: again, the amplitude variations are systematic, so the tones that are paired up vertically will have the same amplitude variations.
2.3.5 Vertical Intervals Are All The Same Ratio
Further, these two kinds of intervals are going to show up very differently to the relative pitch detector:
- Horizontal intervals are only one of each kind, a Whitman's Sampler: while there is sweetness in one voice, especially that of a trained singer, as in the horizontal intervals of that voice there is one instance of each interval of the Harmonic Series (albeit with the fudge we mentioned above of horizontal intervals).
- Vertical intervals are all of the same kind, an entire box of chocolate almond cherry: on the other hand when two voices are sung, say, a Fifth apart, there is an entire wall of the same kind of sweetness, a wall of many Fifths coming at you, namely the vertical intervals above, each of which is a Fifth.
(Again, for an introduction to musical intervals such as the Fifth), see Section 3.1 "The Major Triad".)
2.3.6 Harmony is Sweeter Than Sweet
Therefore we see that note ratios induce a set of the same tone ratios. Further these tone ratios are pure, have balanced amplitudes, and are all of the same interval.
This harmonic effect works best if the two notes of an interval are played on the same instrument having therefore the same distortions from the ideal Harmonic Series. My Men's Chorale teacher Bill Ganz told us that to have our voices harmonize, we should sing the same vowels, which supports this theory as the same vowels will have closer timbres [Ganz, c. fall 1991] (Bill says this is a known effect, not something he independently observed; a cursory search does not produce a better reference, so I cite him). Notice that this effect allows instruments making tones that are not anywhere near the Harmonic Series to still harmonize with each other (at least up to a point where the horizontal intervals interfere too much; see the point about [acoustical-demo, Demo 31] in Section 2.3.3 "Vertical Intervals Have Pure Ratios").
The wall of vertical intervals hammer the same relative pitch sensor with a wall of the pure interval one of the features of the cartoon physics ideal Harmonize Series of your brain is looking for. Recall from the introduction to Section 2 "Living in a Computational Cartoon" the effect of pantyhose making a leg look rounder than round; again more on this effect in Section 2.5 "Recognition: Feature Vectors". Harmony is sweeter than sweet. It's impossibly sweet -- impossible for one voice anyway -- which is just what the theory predicts.
2.4 Interestingness: Just Enough Complexity
Anticipation and prediction is one of the fundamental operations of the brain. We suggest that there is an art to balancing the simplicity and complexity: if understanding and predicting a storyline are too easy, then it is boring, and if too hard, then it is noise, but if just right, then it is interesting. As we discuss below, (1) simplicity comes from data having a "theme" and (2), ambiguity is the absence of a single explanation or theme and therefore a good way to rapidly produce complexity. See Section 7.2 "The Role of Narrative Generally" for how theme and ambiguity are unified to make narrative.
2.4.1 The Simplicity of Theme
People frequently experience that, before receiving information, having an expectation as to the context of that information, its theme, helps considerably in the processing of it. For example, people who speak more than one language sometimes have the experience of hearing words (1) in a language that they know, but (2) that they were not expecting, and therefore not understanding those words until they "listen" to them again in their mind from within the context of the language in which those words were spoken. There are myriad examples of context influencing how something occurs to someone.
Surprise Reduction: The technical name for the amount of expected information one gets from situation is the entropy [ent] [Wilkerson-entropy]. Some call the entropy of a measurement the amount of surprise one expects get out of it. Clearly, if one knows more about what to expect in a situation, the amount of surprise can be greatly reduced. Since it is work to process information, we suggest that the brain likes to have reliable expectations in order to minimize the amount of surprise it is dealing with all day.
Model Inference: Life is full of situations where we may observe the consequences of a situation but are not told explicitly what is the state of the situation. There is nothing left to do but to infer a model of the state of affairs from observation of many details, and therefore inference is likely a constant activity of the brain. For example, people often infer the rules of a game from observation and without reading the rules.
Have you ever seen someone color-coordinate their clothes or even their room? Have you ever been to a "theme party" where everyone was to dress and act from a given era or situation? How about a "theme restaurant" or "theme park"? Having a theme for all of the elements of a given situation
- (surprise reduction) reduces the amount of new information or "surprise" that each one introduces, and
- (ease of inference) allows the brain to construct a whole from the parts.
Conjecture Seven: The brain wants input to have a theme. That is, the brain both infers themes from input and uses themes as context when processing input.
2.4.2 The Complexity of AmbiguityResearch on parsing of sentences suggests that one of the major functions of the brain is to disambiguate ambiguous and incomplete input. In "From Molecule to Metaphor", Jerome Feldman, both a Computer and Cognitive Scientist, points out how much of the brain's processing of sentences is devoted to disambiguation and how easy it is to tease the brain by using ambiguous inputs that resolve in an unusual way. From [Feldman2006, p. 307, 308] (unconventional spacing in the original):
Please read the following sentence aloud slowly, word by word:The horse raced past the barn fell.Sentences like these are called garden-path sentences because, in slow reading, we often notice that we have followed an analysis path that turned out to be wrong....
But why are people surprised in garden-path situations? The brain is a massively parallel information processor and is able to retain multiple active possibilities for interpreting sentence, scene, and so on. Well, there must be a cutoff after which some possible interpretations are deemed so unlikely as to be not worth keeping active. The final piece of their [referring to a model given by other researchers] model was an assumption that a hypothesis was abandoned if its belief net score was less than 20% of that of its rival. We experience surprise when the analysis needed for a full sentence is one that was deactivated earlier as unlikely. This is a complex computational model, but nothing simpler can capture all the necessary interactions.
The input the brain gets as we live life is inherently and often wildly ambiguous. Alternatives multiply and so the number possible ambiguities in a situation can easily grow exponentially. No machine can keep up with the demands of a problem the size of which grows that fast. Therefore:
Much of the brain is a massive disambiguation engine that is running all the time and is functioning at its computational limit.
Jokes are often of the form of an ambiguity of contexts/themes resolved by a punchline which evaluates one way in one context and another way in the other context (say true in one and false in the other); the story that precedes the punchline serves to amplify the weaker context, the weaker side of the ambiguity, so as to maximize the punch of the line by making it break symmetry between two almost equal contexts/themes. Story plots are often of this form as well, in particular mysteries. The language of Shakespeare is full of double meanings and even perhaps a triple meaning here and there. These are all to the same purpose:
Conjecture Eight: The brain enjoys having its disambiguation engine teased.
2.5 Recognition: Feature Vectors
I need to introduce yet another computational idiom: the feature vector [feat]. It is actually a completely straightforward idea that you already use every day. Think of how you summarize a thing when you post an online ad to sell it. Suppose you are selling a car. You might very well put in the ad the total volume of the cylinders in the engine. Your probably won't list the number of bolts in the engine. You probably will list how many miles the engine has driven. You probably will not list the number of hours the radio has been on (even if you knew it). The point is that
Humans naturally abstract; that is, they retain the features that are important for a given purpose and discard the rest.
All language is abstraction. Suppose I point at a chair and I say "what is that?" You say "that is a chair." I say "are you telling the complete truth?" You say "yes!" I lean down and look very closely and I say "yea, but you didn't mention this little scratch down here...." You roll your eyes in annoyance.
An abstraction is a reduced amount of information that still serves the purpose. In the context of recognizing a thing as a member of a class, an abstract adjective is called a "feature". Usually there is more than one, so we collect them together into a "vector", which just means a list where the elements are not interchangeable (that is, you can't swap the mileage and the year of a car without severely changing the meaning of the car ad).
Once we have described a class of inputs as a vector of features, we have a clear algorithm for recognizing a thing as being a member of that class:
- Whenever we encounter a thing, for each feature (in parallel), check if that feature is present.
- If all (or most) of the features in the vector are present ("fire"), then recognize the thing as being in the class abstracted by the feature ("fire" the whole recognizer).
Note that the second part above which looks for the conjunction of features may be realized by a more sophisticated mechanism than a simple AND gate that just fires its output when all of its inputs have fired: a simple conjunction mechanism would be too "brittle" in the face of the noisy input of the real world. For example, even plants such as the Venus Fly Trap can compute a rather sophisticated conjunction of features before recognizing a fly [venus-fly]:
The trapping mechanism is so specialized that it can distinguish between living prey and non-prey stimuli such as falling raindrops; two trigger hairs must be touched in succession within 20 seconds of each other or one hair touched twice in rapid succession, whereupon the lobes of the trap will snap shut in about 0.1 seconds.
Recall that in the case of virtual pitch, the feature recognition mechanism seems to find the greatest common divisor of the tones presented; that is, this recognizer uses a special wholistic property of this particular set of features in order to work well in the face of missing features. Recall that a timbre amounts to the systematic absence of parts of the idea Harmonic Series and that real sounds (in particular, voices) exhibit a range of timbres; thus the Harmonic Series recognizer must be able to robustly find the fundamental even when some of the tones are missing. See Section 2.1.1 "Virtual Pitch: Hearing the Harmonic Series Even When it is Not There", Section 2.1.2 "Using Greatest Common Divisor as the Missing Fundamental", and Section 1.5.1 "Timbre: Systematic Distortions from the Ideal Harmonic Series".
2.5.1 Soft Computing
Machines are good at crisp, mechanical behavior, such as adding huge lists of numbers. This is fun for a while, but it can get old.
I don't often need huge lists of numbers added, but I really would like to go to an online auction site and find a car that is "sort of" like my ideal car which I might be willing to describe.
You will notice the use of the non-crisp or "soft" phrase "sort of" in the previous problem specification. Some people try to get machines to do this sort of soft reasoning that humans do so well. It can sometimes be done, at least within a very constrained context of, say, shopping for cars or plane tickets. Such a discipline is called Artificial Intelligence or Soft Computing or Machine Learning or Statistical Inference, depending on exactly how one goes about it and who is providing the research funding. The important thing for us is that describing problems using feature vectors is a very general and widely used technique. Recalling our conjecture that computational laws are universal, we would not find it surprising if
Conjecture Nine: The brain uses feature vectors for recognition.
2.5.2 False Recognition
To get the brain to (1) have the experience of the presence of a thing, it is not necessary to (2) present the actual thing to the brain. It is enough to just present anything that fires the feature vector in the brain assigned to recognize that thing. That is, if I want you to think "hamburger" I don't have to show you a hamburger, only a picture of one. Recall the example from Section 2 "Living in a Computational Cartoon" of pantyhose making a leg look rounder than round, impossibly round.
It is pretty easy to tell the difference between a photograph of something and the thing itself: you wouldn't accidentally eat a photograph of a hamburger. Yet at the same time the picture definitely says "hamburger" to your brain, often strongly enough that you are willing to part with some money to have a real one right now! But it gets even weirder.
Have you ever seen a cartoon of a person that looks more like the person than the person does?
Some political cartoonists are very good. They
- pick some very unusual features of the person, and then
- exaggerate those features.
Amazingly, what can result is something that looks more like the person than the person. From [Harmon-art-brain]:
As someone who has worked in pen and ink for decades, cartoonist Jules Feiffer realizes that "what we see is often quite divorced from what is actually there," he noted. He calls the two-dimensional representations metaphors, noting that "the metaphor is often more understandable than the real thing."
And research on the perception of faces reveals that the human brain and individual neurons are tuned to extreme representations, explained Margaret Livingstone, a professor of neurobiology at Harvard Medical School. Her research has shown that people are much quicker to recognize caricatures of people than documentary photographs, showing how the brain at work prizes the representative over the more factual.
From "What Caricatures Can Teach Us About Facial Recognition" [Austen-caricature] :
At the University of Central Lancashire in England, Charlie Frowd, a senior lecturer in psychology, has used insights from caricature to develop a better police-composite generator. His system, called EvoFIT, produces animated caricatures, with each successive frame showing facial features that are more exaggerated than the last. Frowd's research supports the idea that we all store memories as caricatures, but with our own personal degree of amplification. So as an animated composite depicts faces at varying stages of caricature, viewers respond to the stage that is most recognizable to them. In tests, Frowd's technique has increased identification rates from as low as 3 percent to upwards of 30 percent.
. . . "A lot of people think that caricature is about picking out someone's worst feature and exaggerating it as far as you can," Seiler says. "That's wrong. Caricature is basically finding the truth. And then you push the truth."
The features can be anything that is important to the task of recognizing that person: a nose or lip shape, etc. -- technically, this feature has a lot of "information". An good example of this I remember was a yellow smiley face that had red blotch on its forehead -- everyone knew it was Mikhail Gorbachev [gorb].
While a thing may induce a feature vector in the brain for use later in recognize the thing, some other things will also fire that vector, causing artificial recognition.
2.5.3 Cubism: Partial Recognition Due to Redundant, Over-Determined Feature Vectors
There is no rule that says that the features in a feature vector must be independent, that for every subset of features, there is some input that will fire those features and not the others. If the brain is doing all it can to recognize things as fast and cheaply as possible, it is going to use the most effective features it has and some redundant / over-determined sets of features can easily arise.
Hmm, what would you experience if some but not all of the features in a vector were to fire? Note that, while there may be no natural input that can cause this, that does not mean that there is no such art-ificial input. This leads to interesting phenomena that can be exploited by artists.
Cubism is a form of art from the early 20th century that has a certain particular quality:
- the parts of an object may be rendered reasonably faithfully so that one recognizes them,
- however they do not arrange into a whole in a coherent way.
This produces an interesting effect:
- we recognize the object, as the features we require for recognition do fire,
- although we still have an overall feeling that we are not seeing the thing in it's natural form, but instead in a disturbed or unhappy or dreamy state.
You may say "Of course it looks disturbed! It's all messed up!" But think for a moment: if it is all messed up, how is it that it looks like anything at all? Again, per Section 2.5.2 "False Recognition", because the features are present.
Consider Picasso's "Head of a Woman" [Picasso1938] on the right. One eye is in profile and the other is straight ahead, a physical impossibility. Yet we have no trouble at all instantly recognizing a woman.
3 Harmonic Music Explained
What can we make of all of this? Do the above insights into physics and computation yet provide enough information for us to derive something that we recognize as music? For example, can we compute a set of notes that will sound good when played together?
Recall the observation of Section 2.3.2 "Harmony Induces Two Kinds of Intervals: Horizontal Within the Note and Vertical Across the Notes" that two notes induce parallel vertical series-es of overtones all of the same ratio means that note ratio and tone ratio are intimately connected. That is, from now on, when speaking of two notes that are in a ratio, what we really mean is that the overtone series-es of the two notes make two series-es of vertical tones having that ratio. From now on we will omit reiterating this point and simply speak of "the ratio of two notes making an interval within the Harmonic Series".
3.1 The Major Triad
Let's try the simplest thing we can that will generate notes that the brain wants to hear together (recall from Section 2.1 "Searching for Harmonics" how much the brain wants to hear the Harmonic Series):
- find the ideal Harmonic Series induced by, say, Middle C,
- map it into one Octave by dividing by two whenever necessary,
- replace tones with notes as, again, these notes will induce the same (vertical) intervals as the tones.
Note that in Scientific Pitch Notation [sci-pitch] the particular Octave that contains Middle C is "Octave 4", the next Octave up is "Octave 5", etc., where we increment the octave number each time we cross the note C. Starting at C4 the sequence of notes we get is as follows.
- Factor of 1: The fundamental: C4.
- Factor of 2: This is just C5 (up one Octave); dividing by 2 gives us 1 times C4 = C4 again, so no new note in the collection.
- Factor of 3: This is G5; divide once by 2 gives us 3/2 times C4 = G4. This is the first really "interesting" different note.
- Factor of 4: This is C6; dividing twice by 2 gives us 1 times C4 = C4 again, which we have already in our collection.
- Factor of 5: This is close to E4; dividing twice by 2 gives us 5/4 times C4 = E4. Ah, another new and "interesting" note.
- Factor of 6: This is G6; dividing twice by 2 gives us 6/4 times C4 = 3/2 times C4 = G4, which we already have in our collection.
Let's stop here. (We stop at harmonic 6 in particular for a reason that will become clear later.) The starting tone/note of Middle C is arbitrary, but the ratios we we get, namely 1, 5/4, and 3/2, times the fundamental, are not. Three notes in these ratios are called "The Major Triad". There are standard names for these notes (relative to the fundamental): reordering them from the harmonic order above to their numeric order when folded down into one Octave, the first note (1) is called the "Root", the second (harmonic 5, so in this Octave 5/4) is called the "Major Third", and the third (harmonic 3, so in this Octave 3/2) is called the "Perfect Fifth" (!). The weirdness of musical nomenclature is just beginning. Note further that it is unclear which terms should be capitalized; we treat as proper nouns any illusory Platonic ideal objects created by the mind: "Harmonic Series", "Major Triad", etc. Again, these names of the intervals reflect their position in the Major Scale, described below, and as you can see, confusingly do not correspond to their order in the Harmonic Series.Major Triad Root: (harmonic 1): 1 = 1.0. Major Third (harmonic 5): 5/4 = 1.25. Perfect Fifth (harmonic 3): 3/2 = 1.5.
Note that according to our measure of interestingness in Section 2.4 "Interestingness: Just Enough Complexity", the intervals of the Fifth (factor of 3) and the Third (factor of 5) are, in that order, the most interesting intervals: (1) they are in the theme of the Harmonic Series, while also (2) they have some complexity resulting from not being a simple power of two times the Root (which if they were would make them subject to the octave effect tending to make two notes sound like one; that is, a factor of 2 is too boring).
If we pick C as the Root (as we did above) then the resulting Major Triad is called the "chord" of C Major. The starting node of "C" was arbitrary; however the resulting triad was not. Is it so surprising that this Major Triad is everywhere in music? It sounds rather nice to play notes in the C-Major Triad; try it. However, after a while it is a little boring, so we would like to add some variety. How little complexity can we add and yet still change something?
3.2 The Major Scale
The Major Scale [maj] is so fundamental to Western music that it is even "built into" the notation (the Major Scale is sometimes called the Diatonic Scale, although the term "Diatonic" seems to mean different things depending on who you ask, therefore instead I use the less ambiguous term "Major"): if you play notes by going up the white keys of a piano keyboard one step at a time, which is the same as going up the alternating lines and spaces of an unadorned musical score, you are playing the C Major Scale. Is this Major Scale arbitrary or is it somehow fundamental to the way the brain hears? If it is fundamental, we should be able derive such a thing simply from first principles as we suggested in the introduction. Let's try it.
3.2.1 Interlocking Triads
Well, we like the Major Triad, so let's make another one, but starting with a different note as the fundamental. To preserve as much theme with the previous triad, let's start with the "closest" notes to the C that we have in our first triad: The first note other than C that we hit was 3/2 times the Root, also called the Perfect Fifth; therefore let's build a triad using 3/2 times C4 = G4 as the fundamental. Let's remember to divide by 2 when necessary to keep everything within the same Octave.Major Triad Up by a Perfect Fifth Root: 3/2 * 1 = 3/2 = 1.5. Major Third: 3/2 * 5/4 = 15/8 = 1.875. Perfect Fifth: 3/2 * 3/2 = 9/4, which is bigger than 2, so divide by 2, giving: 9/8 = 1.125.
Ok, that was so much fun let's go in the other direction as well. That is, let's make yet another Major Triad where that the Perfect Fifth of that Triad is the Root of our first Triad. That means multiplying by 1/(3/2) = 2/3; therefore let's build a triad using 2/3 times C4 = F3 as the fundamental. Let's be sure to multiply by 2 when necessary to keep everything within the same Octave. (Note that throughout we use "~" (tilde) to mean "almost equals".)Major Triad Down by a Perfect Fifth Root: 2/3 * 1 = 2/3 which is smaller than 1, so mult by 2: 4/3 ~ 1.333. Major Third: 2/3 * 5/4 = 5/6 which is smaller than 1, so mult by 2: 5/3 ~ 1.666. Perfect Fifth: 2/3 * 3/2 = 1 = 1.0.
Note that the selection of three interlocking triads is suggested by our measure of interestingness from Section 2.4 "Interestingness: Just Enough Complexity". That is, using three overlapping Major Triads (1) maximizes the theme of the Harmonic Series while not requiring any harmonics beyond harmonic 5 (the interval called the Third), while also (2) having some complexity by not all being of one Harmonic Series.
Now we have three "interlocking" Triads: the Perfect Fifth of one is the Root of the next. How many notes is that? Three notes per triad times three triads is nine notes; however two of the notes where the triads interlock are counted twice, so there are 3 * 3 - 2 = 7 unique notes. Let's plot them on a line to see how far they are from one another.
3.2.2 Using Logarithms to Visualize Distances Between Tones/Notes
Wait... before we do that, when plotting notes, such a plot should "mean something to us". As we saw above, what makes sense would be for the ratios of the notes to have some regularity; that is the multiplicative ratios of frequencies is what our brain is listening to, not the additive distances. For this plot to mean something, we would want equal ratios to show up equally on the plot. How do we turn (multiplicative) ratios into (additive) distances?
The function that does this is called the "logarithm" (or just "log") [log
Mathematics is involved in some way in every field of study known to mankind. In fact, it could be argued that mathematics is involved in some way in everything that exists everywhere, or even everything that is imagined to exist in any conceivable reality. Any possible or imagined situation that has any relationship whatsoever to space, time, or thought would also involve mathematics.
Music is a field of study that has an obvious relationship to mathematics. Music is, to many people, a nonverbal form of communication, that reaches past the human intellect directly into the soul. However, music is not really created by mankind, but only discovered, manipulated and reorganized by mankind. In reality, music is first and foremost a phenomena of nature, a result of the principles of physics and mathematics.
It is a difficult task to properly define the word "music", since many individuals have quite different opinions. My personal definition, is that music is sound that is organized in a meaningful way with rhythm, melody, and harmony. This is what I consider the three dimensions of music. This definition would exclude such things as "rap music", which has rhythm but has virtually no melody or harmony. I perceive "rap" to be poetry, that is spoken rhythmically with a minimum musical element at best. There are other things that pass as music, such as the works of John Cage, that fail to meet my definition of music. However, many people consider things to be music that I do not. The only definition of music that could be universally agreed upon, then, is that music is any sound, or any combination of sounds, of any kind, that someone, somewhere, enjoys listening to.
To understand what music is by this definition, we must understand what sound is. Olson defines sound as an "alteration in pressure, particle displacement, or particle velocity which is propagated in an elastic medium, or the superposition of such propagated alterations creating the auditory sensation that is interpreted by the ear" (3). In English, sound is a form of energy that is perceived by our ears. Sound is produced when a medium, usually air, is set into motion by any means whatsoever (Olson 3). We spend our lives surrounded by the earth's atmosphere, which exerts a pressure on everything in it. At sea level this air pressure, or barometric pressure, is about 15 pounds force per square inch. The actual value of the atmospheric pressure at any given place changes a little from time to time, but its value at any given time is called the ambient pressure. Small but rapid changes in the ambient pressure produce sensations in the ear which we call sound (Backus 18-19). Our ears transform these pressure variations into a form our brains can understand, known as the sense of hearing
Energy can exit in different forms such as electrical, magnetic, or mechanical energy, as in the case of sound (Rossing 31); however, regardless of the form of energy involved, the flow of energy will either be a steady flow, such as direct current (DC) electricity, or it will pulsate or vibrate in waves at different speeds. This pulsation of the energy flow is called oscillation (Moravcsik 23).
The number of times the wave of energy completes a cycle of oscillation in one second is called its frequency. Frequency is measured in cycles per second or Hertz (Hz). Sound, as perceived by human ears, is the range of frequencies of about 16 to 16,000 Hz. The higher the frequency is, in this audible range, the "higher" the pitch is, as perceived by our ears. As the frequency of the energy increases beyond the range of human hearing, it eventually becomes radio waves, then light waves as perceived by our eyes, and then X- rays, gamma rays, etc. Figure 1 is a table of frequencies and some of their specific applications (Grob 717).
The philosophical distinction between music and noise may be a bit fuzzy, but scientifically, there is a well-defined difference. If there is a mixture of a very large number of audible frequencies, such that the ear cannot perceive any specific frequencies or tones, the result is noise. However, if the sound is created by a constant oscillation at a given frequency, our ears would perceive the sound as a specific pitch, or musical note. A good example of this is the human voice; if you were to recite the words to "Twinkle Twinkle Little Star" in your normal speaking voice, the audible sounds produced by your vocal chords would be within a range of audible frequencies but not any one specific frequency. If, however, you articulated the same exact words, but with exact frequencies for each syllable, you would now be singing! The human voice has now become a musical instrument.
If you could actually see your vocal chords vibrating, it would be very similar to the vibration of a guitar string. The low E string on a guitar, when plucked, would oscillate 82.407 complete cycles in one second, thus the sound produced would have a frequency of 82.407 Hz (Olson 48). If our eyes were able to see the actual movement of the string, we would see that the string would move to one side, then back to where it started, then to the other side, then back to the starting point again. It would complete this cycle 82.407 times each second, which of course is too fast for our eyes to see. What we might see, however, is an optical illusion of three blurry images, a stronger image at the midpoint, and two fainter images to the left and right. If we were to pluck the low E string on a bass guitar, the frequency would be an octave lower (41.203 Hz) and the visual effect would probably be easier to detect because the bass guitar string is vibrating at half of the speed of the guitar string.
In addition to the frequency, or pitch, of the vibrating sting, there is another factor to consider. The harder you pluck the string, the further the string vibrates, creating a greater amount of energy, which is perceived by our ears as being louder; this is known as the amplitude of the sound wave.
Amplitude, or loudness of sound is measured in decibels (dB). Human ears begin to perceive sound at a decibel level of about 5 dB; this is called the threshold of hearing. At about 130 dB the sound amplitude level is actually high enough to overload our human limitations and, in effect, hurt our ears; this is known as the threshold of pain (Pierce 109). A detailed scale of decibel levels is given in figure 2. I have personally been given a citation by the "sound police" in Austin, Texas for exceeding legal decibel limits in a bar I was performing in on 6th street. The local cops now carry hand held decibel meters and write tickets to offenders.
V. Sine Waves
If we were to graph the wave of a single perfect musical note of a specific frequency on a XY axis, with X being frequency and Y being amplitude, the result would look something like figure 3, a wave which rises and falls sinusoidally with time, and is called, simply, a sine wave. The sine wave is the most perfect type of sound wave, and usually exits only in the laboratory, or in the sound wave produced by a tuning fork (Pierce 40). In fact, when a tuning fork is vibrating, the motion of the prongs is sinusoidal. The simple experiment shown in figure 4 demonstrates this. One prong of the fork is provided with a light pointed stylus as shown. A glass plate is coated with a layer of soot or other material that will yield a fine line when the tip of the stylus is drawn across it. The fork is then set into vibration and the vibrating stylus is drawn across the plate by moving the fork in the direction of the arrow. The stylus then inscribes a line in the coating which is found to have the shape of the sine wave (Backus 30).
Most acoustically produced sound waves are not perfect sine waves because of harmonics and other factors. Also, the actual shape of the wave can be changed electronically, as in the case of synthesizers and other electronic musical instruments. Some examples of these waves are shown in figure 5 (Rossing 115).
VI. Pitch Standard
Before I attempt to discuss harmonics, timbre, and the musical scale, I need to explain the pitch standard. Today, the standard of tuning is such that A4 or the A above middle C is equal to 440 Hz, with all the other notes of the chromatic scale at frequencies relative to their position in the scale. This has not always been the case. There are pipe organs in different parts of Europe with A4 tuned anywhere from 374 to 567 Hz. A tuning fork reportedly used as a pitch standard by Handel vibrated at 422.5 Hz and this standard was generally accepted for a period of about 200 years, which included the lives of Bach, Haydn, Mozart, Beethoven and their contemporaries. In 1859, a commission appointed by the French government, which included Berlioz, Meyerbeer, and Rossini, selected 435 Hz as a standard. In the early 20th century, there was a movement to establish tuning based on C, where all the C's would be powers of 2, such as 128, 256, 512, etc., which would place A4 at about 431 Hz. Although it was "theoretically perfect" on paper, such a radically different standard sounded "out of tune" to the musicians, and this idea did not last very long. In 1939, an International Conference in London unanimously adopted 440 Hz as the standard frequency for A4, and this standard is almost universally used by musicians all around the world today. In fact, the United States Bureau of Standards broadcasts an exceedingly precise 440 Hz tone on its shortwave radio station WWV for checking local standards (Rossing 112).
As stated earlier, the actual sound waves that we hear from natural sources are rarely perfect sine waves (Moravcsik 115). There is an interesting phenomena that applies to audio frequencies and to other frequencies as well. In addition to the primary frequency, called the fundamental, there are also other higher frequencies called overtones, or more precisely, harmonics, that are produced by the natural sound source, but at a much lower amplitude. These harmonics are integer multiples of the fundamental (Moravcsik 115). If we were to label the fundamental frequency "f", then the value of the harmonics would be 2f, 3f, 4f, 5f, etc. For example, if we were to choose A1, which has a frequency value of 55 Hz, as the fundamental frequency, f, then the harmonics would be 2f=110=A2, 3f=165=E3, 4f=220=A3, 5f=275=C#4, etc. Each of these harmonics would have intervals that would get closer and closer together. The integer multiples that are powers of 2, such as 2f=110, 4f=220, 8f=440, etc., would each sound an octave higher. The harmonic series is composed of the following frequency ratios: 2:1, 3:2, 4:3, 5:3, 5:4, 6:5 ,8:5, etc. with any two frequencies an octave apart having a ratio of 2:1 (Olson 38). Figure 6 represents the fundamental A1, and its first 9 harmonics. The powers of 2, or the octaves, will continue to appear as the series continues. These harmonics continue into virtual infinity above the fundamental, but become weaker in amplitude the higher they get.
The first obvious question in regards to the harmonic series is "can human ears hear these frequencies above the fundamental?"; the answer is yes... and no. Our ears perceive these frequencies, but only in the way that they affect the tone color of the fundamental, and that perception is often very subtle. Different factors affect which harmonics have the greatest amplitude. In the case of acoustical musical instruments, the actual materials that produce the sound, such as metal, wood, plastic, fiberglass, etc., and the density of those materials affect the harmonics. Other factors include the angles of acoustically reflective surfaces, and the method of attack, such as the bowing of a violin string, the striking of a piano string with a hammer, etc. When different harmonics are emphasized or de- emphasized, different instruments such as a flute, violin, and clarinet could play the same exact note, but would each sound very different. This unique tone color of each instrument, which is created by emphasizing or de-emphasizing different parts of the harmonic series is called its timbre (Rossing 114). Pipe organs that were built centuries ago were designed to make it possible for the player to change the timbre of the instrument by manipulating the harmonics through the use of stops and drawbars that controlled the amount of airflow to the different pipes. If just the right amount of air was sent to the pipes that were tuned to specific harmonics and those pipes then sounded their respective pitches at a given amplitude, the result would be a change in the timbre of the instrument. The human voice also produces different timbres from person to person, because of variance in vocal chords, oral cavity, facial bone structure, etc. Today, electronic musical instruments such as synthesizers, can artificially manipulate harmonic emphasis, and even the shape of the wave, in ways that could never be done naturally, yielding a variety of very interesting new timbres.
IX. Musical Scale
In the process of the development of music, the first step was to select from the infinite variety of audio frequencies possible, the limited series to be used. The series of notes so selected is called a scale (Wood 171). A scale is sort of a "musical ladder" that climbs from a starting note to a note one octave higher. This can be done in a virtually infinite number of ways. The notes of these scales are then used to create melodies and harmonies. In diverse cultures all over the world, many different musical scales are used, many of which sound very strange to western ears. For our purposes, I will only consider the basic diatonic scales used in western music. These scales originated in Europe, and are still the basis for all western music today, including American music.
There are a variety of different scales used in western music, all of which are notes of the basic chromatic scale, which is a series of 12 half-steps or semitones in each octave. There are 7 scales called modes that use different combinations of half-steps and whole- steps (two half-steps) to climb to the octave in 8 notes, with the 8th note being a note that is one octave above the 1st note. The most common of these 7 scales are the Ionian or Major scale and the Aolian or Minor scale.
The origin of our own major and minor scales can be traced with fair certainty to the music of the ancient Greeks. Music undoubtedly played an important part in the life of the Greek people. Plato assigned to it a prominent role in education, maintaining that it was effective in producing a certain inner harmony which other subjects of education failed to give. It is said that all Greek citizens had some training in music and were able to take part in the music which accompanied public functions. Unfortunately the actual music has been almost completely lost, only a few fragments survive. The Greek contributions to the theory of music, on the other hand, have on the whole, been well preserved in a mass of writings, especially those coming from the followers of Pythagoras. The records of musical culture show that there was considerable development as early as 1200 B.C. (Wood 173).
The notes of the major scale are found in the harmonic series that I discussed earlier. The musical scale, then, is a phenomena of nature, and is nothing more than numerical ratios of the value of the frequency of the starting note. If we were to make a musical scale using only the notes in the harmonic series, we could still play some very simple melodies. The bugle, for example can play the fundamental of the harmonic series and maybe 5 or 6 of the harmonics and can play very simple tunes such as "Taps" or "Reveille". If a bugle is tuned to fundamental pitch of C, the average player could play C,G,C,E,G, and C. However, to play more complex melodies, it is obvious that more notes are needed. The trumpet, for example, has 3 valves that open different paths of tubing, which changes the fundamental pitch, and enables the trumpet player to play the harmonics of 7 different fundamentals.
X. The Pythagorean Scale
There is a mathematical method that is used to create all the notes of the musical scale from the harmonic series; this scale is called the Pythagorean Scale. As I explained earlier, the naturally occurring harmonics are whole number multiples of the fundamental. To get the other notes of the scale, we must use fractions.
The harmonic series gives us the ability to create intervals of perfect 5ths, the ratio of 3/2f, "f" being the starting frequency. By using these p5ths, we can create the pythagorean scale. We'll use the key of C, since the C scale only involves the white keys on the piano. If the starting note C is the frequency "f", then C an octave higher is 2f. The notes of the ascending pythagorean C major scale are C=f, D=9/8f, E=81/64f, F=4/3f, G=3/2f, A=27/16f, B=243/128f, and C=2f (Backus 138). These notes are obtained by jumping p5ths from the starting note, in this case C. In other words, 5th up from C is G, a 5th up from G is D, a 5th up from D is A, a 5th up from A is E, and a 5th up from E is B. The only note missing is F, which may be found by going a p5th down from the starting note C. Note that since these notes were obtained from the perfect 5th ratio of 3/2, each of the fractions is a power of 3 over a power of 2, with the exception of the note F, which was obtained by going a p5th down, and therefore has a ratio of 4/3, a power of 2 over a power of 3, the reciprocal of 3/2. The notes thus created are in several octave groups; if we compress these letter-name notes into one octave, we have the Pythagorean Major Scale (Backus 137-138). This system of jumping 5ths, if continued, would also yield the "in-between" notes, i.e. the black keys, and eventually end up back at C. This is called the Circle of 5th's (Figure 7).
XI. The Tempered Scale
The Pythagorean Scale is mathematically perfect in the relationship of the notes of the scale to the starting pitch, but it was soon discovered that this perfection created serious problems. In the Pythagorean Scale we just created based on C, melodies and harmonies would sound beautiful if only those 7 notes, in different octaves, were used, i.e. the white keys, and C was always the tonal center. But suppose that we continued with the circle of p5th method and obtained the other notes, the black keys, and tried to use some other note than C as the tonal center. We would find that when we attempted to use another note as a tonal center, such as A-flat, it would sound badly out of tune. All the intervals would be perfect in relation to C, but quite imperfect in relation to A-flat. Every time you wanted to play in a different key, which had a different tonal center, you would have to re-tune the instruments, which is exactly what they had to do back in the 17th century. After an orchestra played a selection, the tuners would run up on the stage and re-tune the harpsichord and any other instrument that required a specialist to tune, to the key of the next selection. Woodwind and brass players had to play a different instrument for each key. It was a real mess until they got the idea of creating a scale that would work in every key.
The most important thing to understand is that the Pythagorean Scale had mathematically perfect intervals, but only in relation to the starting note upon which the scale was built. Any given note would have a slightly different frequency in the different keys. If you started on C, or any other note, and traveled the circle of 5ths, using the perfect 5ths found in the harmonic series, when you arrived back at the starting note, it would not be the same pitch! Also, the half-step intervals of a chromatic scale would not be equal. What was eventually done to correct these inconsistencies was that a little was added to some intervals, and a little was shaved off of others, until the interval ratio between each of the half-step intervals was the same. This alteration is called tempering, the result being the tempered scale, or the scale of equal temperament. Equal temperament is usually said to have been invented by Andreas Werckmeister in about 1700, but there is evidence of experimentation with the idea much earlier. In the scale of equal temperament, perfect 5ths, and other intervals, are no longer mathematically perfect, according to the harmonic series, but they are so close that very few people could hear the difference. Modern musicians that have grown up hearing the tempered scale do not even notice the slight imperfections of its intervals. This compromise of intervalic relationships makes it possible to tune a piano to the tempered scale and play it in any one of the 12 major and 12 minor keys. Bach was so happy about this new scale that he wrote his now famous "The Well Tempered Clavier" in 1722, which contains pieces of music in each of the 24 different keys, which now can be played without re-tuning the instrument (Apel 835- 836).
The scale of equal temperament is a division of the octave into 12 equal intervals, called tempered half tones. Since an octave is the distance between f and 2f and that octave interval is divided into 12 equal intervals, a half tone or semitone is the frequency ratio between any two tones whose frequency is the 12th root of 2, or 1.059463 (Apel 836). So, if C=f, C#=(12th-root-of-2)*f=1.059463f, D=[(12th-root-of-2)*2]*f=1.122462f, etc. The interval between any two semitones is 12 times the logarithm on the base 2 of the frequency ratio (Olson 46-47). Now, the ratios of the C major scale are a bit more complicated, and decimals are needed. Given the starting note C=f, the tempered major scale would be: C=f, D=1.122462f, E=1.25992f, F=1.33484f, G=1.494307f, A=1.681793f, B=1.887749f, and C=2f (Olson 47). Figure 8 is table of the actual frequencies of the tempered scale in the entire 8 octave range of the piano, with ranges of musical instruments and human voices included.
XII. Psychology of the Musical Scale
For some reason that no one really understands, there is a psychological effect upon human listeners in regards to the musical scale. The tonic pitch, or tonal center is not only the mathematical center of the scale, but is the psychological center as well. Human perception of the tonic pitch in relation to the other notes of the scale gives each note of the scale, including the tonic pitch, a distinct "personality" or identity. If we were to label each note of the major scale with a number, with the tonic pitch being 1, then the ascending major scale would be 1, 2, 3, 4, 5, 6, 7, 1. The last 1 is an octave above the first, and could also be called 8; an octave above 2 could also be called 9, etc. When individual notes of the scale are played in certain melodic sequences, such as 1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6, 8, 7, 9... the human ear anticipates the next note to be 8 or 1, the tonic pitch. In the major scale, 2, 5 and 7 are notes that the ear usually expects to be followed by 1, if they are preceded by notes in a certain sequence. The psychological pull is strongest toward the tonic pitch, but this phenomena also exists in other parts of the scale as well. Notes that make the ear expect a certain note to follow them are called tendency tones. This psychological effect extends into harmony, which is the playing of several notes together as chords, and into chord progression, or the sequence of the chords played. This tension and release effect is extremely important in the perception of music. It is my personal belief that these psychological effects are shared by other species as well. I have listened very carefully to birds singing and I can hear evidence of the what we call the musical scale. If birds did not use some form of the musical scale in their singing, it probably would not sound pleasant to human listeners.
So far, we have discussed elements of music that have some relationship to the frequency of the musical note. Frequency or pitch is the number of times the acoustical energy oscillates in one second, which is perceived by our ears as how high or low the note sounds. Timbre is the tone quality of the note, caused by emphasis of different harmonics. Amplitude is amount of acoustical energy, or how loud the note is. There is one more important area to consider, which is the duration of the note, or how long of a period of time that we hear the note.
Figure 9 shows the mathematical relationship of time (x axis) and frequency (y axis). Graph paper that has one logarithmic axis is called semilogarithmic graph paper, and it is used quite frequently for representing quantities that are functions of frequency (Rossing 133-134). It surprised me to learn that I had been reading semilogarithmic graph paper for the past 27 years, since music notation is nothing more than frequency and time on an axis! The time factor of music falls into two main areas, rhythm and tempo.
In most music, a given note generally is not sounded for more than one second, therefore we are confronted with the transition from one note to another as time goes by. The relative time durations of such notes determine what in musical language is called rhythm. Our body, through its various periodic functions, such as heartbeat, breathing, wake-and-sleep cycles, etc., has its own rhythms, and so music, which also has a rhythm, seems natural to us (Moravcsik 110- 110). Rhythm is the whole feeling of movement in music, with a strong implication of both regularity and differentiation. Thus, breathing (inhalation vs. exhalation), pulse (systole vs. diastole), and tides (ebb vs. flow) are all examples of rhythm. Rhythm and motion may be analytically distinguished, the former meaning movement in time and the latter movement in space (Apel 729).
The standard of measurement in musical time is the beat. The beat is not a fixed length of time; it can be long or short according to the character of the particular musical composition. The nature of the beat is commonly experienced by most persons when listening to music. For example, when walking to the accompaniment of a military march, your footsteps mark off equal measurements of time, which can be considered as beats (Ottman 51). Beats are usually grouped into sets of 2, 3 or 4 called bars or measures. These measures follow each other in time as a repeating pattern of beats. The first beat of each measure is usually stronger or accented, to establish the beginning of each measure, i.e. ONE two three four. Other beats of the measure are often accented as well; for example, rock-&-roll is distinguished by accenting 2 and 4 (one TWO three FOUR). This organization of beats into measures is called meter.
If each measure has 4 beats, a note value that would fill the entire time value is called a whole note. A note that is one-half of that value, 2 of which would fill the time space of the measure is called a half note. A note that is one-fourth of the value of the whole note, 4 of which would fill the time space of the measure is called a quarter note; in this case, this would be an example of 4/4 time or 4 beats per measure with the quarter note being equal to one beat. This designation number of beats per measure and which note value equals one beat is called the time signature.
Even as measures are divided into beats, beats are then sub- divided into smaller pieces. For example, half of the value of a quarter note is a eighth note. This sub-division continues using powers of 2, i.e. 16th notes, 32nd notes and sometimes even 64th notes. This form of time division into powers of 2 is called simple meter.
There is another type of meter in which beats are sub-divided into 3 equal parts called compound meter. The same note values are used but with the addition of a dot behind the note; a dot adds one- half the value of the note it follows, so if a quarter note equals 2 eighth notes, a dotted quarter note equals 3 eighth notes. The most common example of compound meter is 6/8 time, which actually has two beats per measure with a dotted quarter note being equal to one beat (ONE two three FOUR five six). Even in simple meter, any given beat can be divided into three equal parts; this is known as a triplet. Figure 10 is an table of simple time signatures and figure 11 is a table of compound time signatures.
Tempo is nothing more than the speed of the beat. For many years, Italian words were used to indicate tempo such as largo (broad), lento (slow), adagio (at ease), andante (walking), moderato (moderate), allegro (fast), and presto (very fast). The problem is that these designations were open to personal interpretation, and were therefore sort of ambiguous. The common practice today is to use metronomic markings, or beats per minute. For example, there are 60 seconds in each minute; if the tempo was such that a beat equaled one second, and each quarter note got one beat, the tempo would be would be quarter note = MM 60. Most musical compositions fall in the range of MM 60-80, which is about the speed of human heartbeats or moderate walking (Apel 836-837). Of course, the tempo can easily be twice that fast if the music is intended for dancing, especially the music of those younger folks that are still full of energy!
XVI. The Non-Mathematical Factor
There are obviously many aspects of music, that are directly related to mathematics and physics, and are easily explained; however, an explanation of the phenomena of music would be incomplete without briefly discussing those aspects of music that are impossible to explain rationally and are dependant upon human perception, such as the psychology of the musical scale that I discussed earlier. The subject of this paper is the relationship of music and mathematics; however, in another paper I have written, "The Two Sides of Music", I examine the non-mathematical aspects of music in much greater debth.
There are many examples of the relationship of music and mathematics; I have managed to identify a few of the most important ones. This subject could be expanded into a doctoral dissertation of hundreds of pages, and I am quite sure that someone, somewhere has already done just that. I chose the subject because I wanted to learn something relevant to my career field; I honestly feel that I have succeeded in that goal. It could be argued that music is, in fact, a branch of mathematics. My final conclusion is that music is a unique blend of mathematics, physics, and the unexplainable emotional right- brain human perception phenomena.
If one person were to say that music is a set of mathematical relationships that can be explained with algebraic equations, and another person were to say that music is a gift from God, that mankind will never really totally comprehend, both of those individuals would be absolutely correct.
Backus, John. The Acoustical Foundations of Music. 2nd ed. New York:
Grob, Bernard. Basic Electronics. 4th ed. New York: McGraw Hill, Inc.,
Moravcsik, Michael J. Musical Sound: An Introduction to the Physics of
Olson, Harry F. Music, Physics and Engineering. 2nd Ed. New York:
Ottman, Robert W. Elementary Harmony. 2nd ed. Englewood Cliffs, New
Pierce, John R. The Science of Musical Sound. New York: Scientific
Rossing, Thomas D. The Science of Sound. Reading, Massachusetts:
Wood, Alexander, J.M. Bowsher. The Physics of Music. 7th ed. London: