Data of the Humpback Whale


Diagram of HARP showing the microphone attached to a cable suspended by floats, and photo of HARP with case removed, overall cylindrical structure with electronics, disk drive array, and battery array.
Scatter plot with deployments on the vertical axis and time on the horizontal axis, showing some regular patterns in deployments at certain locations, irregular patterns elsewhere. Some filenames might be typos, or just idiosyncratic.
Screenshot from Triton software with two diagrams: a wave representation below and spectrogram above.
Early LTSA example from “Marine Biological Sound West of San Clemente Island” (1965) by Thompson, showing two low frequency sounds around 20Hz happening regularly over a 4 minute period.
Recent LTSA from “High-frequency Acoustic Recording Package…” (2007) by Wiggins and Hildebrand. Left most image shows LTSA over 2 hours, and three images to the right show excerpts: dolphin whistles, dolphin clicks, and a 50kHz echosounder.
Tan paper with a sketch of a sperm whale in the corner. Calligraphy with lots of long tails and curly motifs, reminiscent of Mongolian calligraphy or Burmese script.
Figure 1 from “Songs of Humpback Whales” (1971) by Payne and McVay showing graphical representation of humpback whale song hierarchy.
Series of bass and treble staves covering 7 minutes, showing abstract brightly colored blobs that represent individual units of humpback whale song. Around 6 unit types are represented. Some units repeat more than 20 times, others alternate after 2–8 repetitions.
A few dozen Markov states represented as circles, in five colors according to the phrase they belong to, with arrows indicating the transitions between states. Figure 2 from “Song hybridization events during revolutionary song change” (2017) by Garland et al.
“Humpbacks” (2018) by montereydiver
Dynamic horizontal cultural transmission of humpback whale song at the ocean basin scale” (2011) by Garland et al. Each color in the grid represents a distinct song type.


Example spectrogram from Audition showing eight repetitions of a three-unit phrase over 40 seconds.
Two-frame animation alternating between CQT and FFT, showing three repetitions of a three-unit phrase.
Curves showing the 0%, 10%, 50%, 90%, 99%, and 100% percentile of loudness across a single subchunk for ten thousand subchunks
HARP noise spectrogram: a rising tone, followed by a noise burst, a strong tone, noise burst, then finally a quiet tone for the remaining 70% of the recording.
Two-frame animation showing spectrogram before and after HARP noise removal.
Two-frame animation showing 12-hour LTSA spectrogram calculated from the mean and from the 99th percentile for each subchunk frequency band.
278 days of audio, 128 rows of spectrograms, around 2 days per row, selected from multiple locations.

A Brief Soundscape

  1. “Oink” or “grunt” sound from unknown source. Possibly fish. Yes, fish make sounds. Listen to some recordings here. Saipan, October 2015
  2. The omnipresent engine. Hawaiʻi, May 2011
  3. Possible echosounder. Wake Island, April 2016
  4. Navy sonar. Kauaʻi, July 2010
  5. Dolphin whistles, aliased from higher frequencies. Ladd Seamount, May 2009
  6. More dolphin whistles with engine noise. Hawaiʻi, July 2012
  7. Humpback song with multiple clear echoes. Hawaiʻi, March 2014
  8. HARP microphone getting scratched by something. Wake Island, April 2011
  9. Sperm whale click used for echolocation while feeding/foraging. Cross Seamount, January 2006
  10. Likely fin whale or sei whale call. Tinian, December 2013
  11. Minke whale call (also described as a “boing”). Wake Island, March 2012

Nonlinear embedding

Grid of 100x100 grayscale spectra showing some very broad clustering with no discernible features.
First two dimensions of UMAP embedding for all frames, each point is one frame with time mapped to hue.

Similarity Matrices

Spectrogram on left going from top to bottom, nxn Euclidean similarity matrix to right.
Spectrogram on left going from top to bottom, nxn correlation and covariance similarity matrices to right.
Similarity matrices colored by dominant frequency of repetition at time scales of 45 seconds, 3 minutes, and 12 minutes.

Unit Detection

Plot of “repetitiveness” on top of corresponding spectrogram, with colored bars in plot indicating separate peaks/events, also drawn as white bars below.
Threshold image and spectrogram with bounding boxes for units. Threshold image is red and blue against a black background, with red blobs from the higher threshold on top of blue blobs from the lower threshold. Spectrogram shows approximately 45 seconds of whale song with 26 separate units indicated.
Approximately 90s spectrogram showing alternation between two units that continually morph over the course of the recording, with unit boundaries colored by a 3D UMAP embedding.
Two-frame animation showing approximately 45 second spectrogram, then again with non-whale background noise blended towards black.

Triplet Loss Embedding

Small spectrogram of target unit and 16 “matching” results from the dataset. Results are mostly quiet.
Small spectrogram of target unit and 16 “matching” results from the dataset. Results mostly appear correct.
Globular UMAP cloud with colors showing some local grouping but no clear clusters.

Generating New “Songs”

Five binarized outputs from a recurrent neural network.
  • Sonifying the results using concatenative synthesis from the original recordings.
  • Running a large amount of data through the neural network, saving the state of the network at each moment, and looking for other moments in the data with a similar state.
  • Increasing the amount of training data, the number of frequency bands, and the levels of quantization.
  • Using a mixture density network or discretized mixture of logistics for the output.
  • Switching to a seq2seq-like model, which is explicitly designed to encode state.

Phrase-level Similarity

Approximately 45 second humpback whale song spectrogram, showing white and yellow boxes around units from two different songs.
Approximately one hour spectrogram of humpback whale songs, showing at least seven distinct falling pitch trends, each over the duration of minutes.


Screenshot of “Pattern Radio” website showing notes from Ann Allen and spectrogram of multiple whale songs.
  • Alexander Chen (Google Creative Lab), Jonas Jongejan (Google Creative Lab), Lydia Holness (Google Creative Lab), Mohan Twine (Google Creative Lab), and Yotam Mann for holding the bigger project together, keeping me on track and sharing lots of helpful feedback, ideas, and questions 💪
  • Ann Allen (NOAA Fisheries PIFSC) for entertaining my questions about weird sounds I heard in the course of listening to lots of hydrophone recordings 🐋
  • Aren Jansen (Google Machine Hearing) for answering a bunch of my extremely poorly informed questions at the beginning of this project 🙏
  • Kyle Kastner for a variety of suggestions, but especially for pointing me to this talk about detecting right whale calls, as well as Justin Salamon’s talk on self-supervised learning from weak labels 🙌
  • Matt Harvey (Google AI Perception) for multiple discussions, answering a bunch of questions about his work with the same data, and especially for helping me understand the potential connection between the UMAP loops and Ellen Garland’s Markov chains, and that blurring the spectrogram before creating the distance matrix was equivalent to checking the max across multiple offsets 🤦‍♂️
  • Nikhil Thorat (Google PAIR) who did some early work on unit segmentation and classification from a large chunk of manually annotated data. I learned a bunch from talking to him about what did and didn’t work 👏
  • Parag Mital for reviewing this article, for a suggesting useful directions for analysis and making recommendations for other audio visualization tools to check for inspiration 👌



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store