Simple Design of Auditory Scenes

This is a software post. You can directly jump to a demo!

In the last few months, I’ve been working on different aspects of Auditory Scene Analysis. Scene analysis refers to the function of making sense of a scene from the sensation it produces. By making sense I mean going from the soundwave to a simpler and more structured representation. For example when two people speak simultaneously, the sound they produce adds up and your ear recieves a mixture. Realizing that there are two ‘objects’ in the scene and identifying some of their features is the first necessary step to hope to understand what they might be saying. This is probably obvious, but as of today no artificial system has managed to beat human performance on such ‘source separation’ tasks.

In the lab, scene analysis is studied by observing how humans treat simple mixtures of sounds such as pure tones, sweeps, chords and melodies. A simple representation of such sounds is in the form of a spectrogram, where pure tones and sweeps are lines.

Psychologists have found that for scene composed of such simple sounds, humans tend to perceptually group sounds together according to some simple rules. For example, harmonics are grouped in a single object (simulataneous grouping), sequence of tones ‘close’ in time and frequency are grouped in ‘streams’ (sequential grouping). Proximity in time and frequency are key features in scene analysis that are visually salient in a spectrogram. For this reason, the spectrogram is the tool of choice in this field.

Here is a typical example from the book of Al Bregman (who coined the term Auditory Scene Analysis). It describes a case a ‘stream segregation’

In this plot, dark lines are tones, their y-coordinate is log frequency, their length is their duration. Dashed lines are reported subjective perceptual grouping (left as a melody, right as two concurrent ‘streams’)

Starting to work in the field, I needed tools to easily design structured auditory scene with the type of sparse spectrogram I described, and finding no pre-existing solution, I just made my own.

I wanted to easily

  • declare structured scenes
  • generate and play the corresponding sound
  • visualize/sketch a spectrogram (not from the sound, but from the scene description)

The idea of my approach is simple: A scene is constituted of atoms such as pure tones or sweeps. Atoms can be grouped into nodes if they share some characteristic (ex: belong to a chord). Groups can be grouped as well (chords in a melody) and so on. This leads to a representation of scenes as a tree, with atoms as leaves.

Here is my code:

It is all in python, hosted on github. It has a small demo and a lot of notebooks examples (notebooks folder) using ipython notebooks. if you are new to python, I suggest you install anaconda that is free and contains all you need to get started with python.


Daydreaming & apps for Science

This week, we launched the Daydreaming app.  It is an Android application to study mindwandering ‘in vivo’. It was co-developped by Sebastien Lerique, Mikael Bastian, Jérôme Sackur and myself. Try it out!


Mindwandering or daydreaming refers to the experience of having thoughts wandering off your current task, often without realizing it at first. A common experience of it is while reading and realizing that you paid no attention to the last paragraph you read – you were elsewhere.

Part of the questions psychologists ask about mindwandering are about the initiation of this wandering (mental control), it’s connection to consciousness and awareness, the nature or type of thoughts involved, and the factors that facilitate or prevent one’s mind from wandering.

To study mindwandering, psychologists usually ask subjects to come to their lab, do a slightly boring task and either ask them to report their mindwandering epochs as they become aware of them or probe them at random moments about the content of their thoughts – were they focused or mind-wandering, thinking about future or past events, etc. This last method is called experience-, or thought-sampling.

These studies have some limitations that this app tried to overcome:

  • the lab setting prevents the study of ecological situations
  • the collection of data is limited to the sole hour the subject spent in the lab.
  • data collection is slow and scarce: as the possible number of participants coming to the lab is highly limited due to time, space, and funding issues.

This app precisely samples the thoughts of a high number of users during an extended period of time and through the great variety of ecological situations that daily life offers. The various effects of spatial (eg. at home vs. at work) and temporal (eg. week-days vs. week-end) and contextual (presence of noise) factors on mind-wandering thus become measurable.

Apps for psychology

Smartphones are little computers that have come into the daily lives of a large part of the population. They have a great potential as tools for scientists to both collect data, reach out and inform the public about their research.

There is a current trend to develop such apps. Here are a few recent examples: ScienceXL, Trackyourhappiness, Idichotic, The great brain experiment, Brain’us, Dextrickery

What it takes

There are some hidden complexities in the design of such scientific apps. In standard psychological experiments, the experimenter designed his app using standard and simple tools and subjects use the experimenter’s computer. The experiment can be ugly, subject made it to the lab, expect to be paid, and is not distracted. Behavioral data is stored locally.

For an app:

  • it has to be coded for the particular smartphone, in languages not usually used in academia.
  • it has to be pretty and ergonomic: people have high standards
  • you need to motivate users with something else than money: either by making the app fun or by gamifying it or by providing interesting feedback to the user
  • data has to be sent to a server, in an encrypted and anonymised form and stored.
  • finally the app has to be advertised online

Those steps are not common practice in academia, which makes the entry cost to develop apps for science quite high. One solution is to externalize them to a company.

For most of these examples I gave, development is externalized to companies. The consequence of that is that these projects are expensive (restricted to rich labs) and often one shot (unless one puts more money in it). Also code is not shared, unless the company has an open model, which I haven’t seen so far. This is to be opposed to other experimental methods that are fully in the hand of the scientists and easily shared and used by the community. If apps get part of the standard toolbox of scientists, tools should be shared and mastered by the scientists themselves. This supposes creating a community of users/contributors sharing this goal and developing the core tools, general enough to be of interest to a community. This shared development already happened in the neuroimaging field and is on its way in psychology, with the emergence of Psychoinformatics. This is one of the aims of the project Science en Poche (Pocket Science), started in Paris. In this app context, a well identified separated set of tools is the server side software that manages experimenters, users and their data. This is the purpose of the software Yelandur developed by Sebastien Lerique.

It is an hard task in today’s academic world where the expected fast rate of publication, and the evaluation system that focuses on publications only deters people from this kind of long term thinking and engagement in collaborative communities.

More information

  • A very detailed information about mindwandering can be found on the website of Johnny Smallwood, a leading expert in the field
  • Tal Yarkoni’s psychoinformatics’ lab page, a good example of efforts to share tools within the psychological community
  • An interesting article on the potential of smartphone for psychology:  the psychology smartphone manifesto

Shepard Tones and Tritone Paradox

I realize it is a quite classical blog topic in the auditory world (because it is old, has been studied extensively, and is pretty cool) but I will nonetheless present a class of sounds –the Shepard Tones– named after Roger Shepard that have interesting perceptual properties, with a focus on the geometry of the perceptual space.

As a prerequisite I need to write a bit about pitch perception

The geometry of pitch

A line

Pitch refers to a subjective perceptual attribute of some sounds (along with loudness, duration and timbre). It is very loosely defined by the American National Standards Institute as :

that attribute of auditory sensation in terms of which sounds may be ordered on a scale extending from low to high

Geometrically this corresponds to a one dimensional perceptual dimension, that can be represented as a line.

An helix

This view is however quite limited and fails to capture the notion of pitch class of sounds (the same note played on a piano at different octave share a same “pitch class” and are perceptually ‘close’).

This has led to a bidimensional view of pitch with pitch height (the previous 1d line) and pitch class. The two dimensions are dependent: as you increase the pitch height you periodically loop over the pitch classes (imagine a piano again).

Geometrically, imagine a straight line. You want to bend it such that points sharing the same class (i.e. in octave relation) get closer. One good way of doing that is to create an 2d helix embedded in a 3d space. When aligned to an axis (say vertical), the pitch height corresponds to this aligned dimension, while pitch class corresponds to the ‘angle’ around this axis

Source: Shepard 1982

Flattening the helix into a circle

Given this helix, one can fix the pitch class and change the height by discrete octave steps (visually fix the angle and jump to the next floor of the helix). But you can’t fix the height and change the class… unless you flatten the helix and turn it into a circle.

That was the motivation to build shepard tone: get rid of the height dimension of pitch.

Position on a circle is a purely periodic function of angle. so a class of sound with such a geometry would only need to be determined over an octave.

A theoretical paradox

As you move along (therefore up or down) an helix, there is no ambiguity whether you are going up and down.

Once flattened as a circle, local moves of angle can still be associated with their former either upward or downward class move. If this shift is percieved as such, you can loop over the circle in small steps but still percieve the class as going up (or down). This is an auditory equivalent of Penrose’s impossible steps.

Source: Wikipedia


But what about diametrically opposed moves? Is it up or down? It is a typical ambiguous setting.

Source: Englitz et al 2013


Shepard Tones

Shepard tones are sounds fully parameterized by pitch class. They are chords of pure tones whose frequencies are all the powers of two of a base frequency (f_b the class), with a bell shaped spectral envelope e(f)

x(t) = \sum_i \cos(2 \pi (2^i f_b)t+\phi_i)\,e(2^i f_b)

 A simple way to represent them is to draw their spectrum (or also spectrogram) with a log-frequency axis. The logarithm transform the geometrical relation between tone frequencies (f_{k+1}=2f_k) into an arithmetic one (\log_2(f_{k+1}) = \log_2(f_k) +1 )

Here is a representation of the spectrum of two Shepard tones with close base frequency (full and dashed lines)

On the log-frequency spectrum, the shepard tone can be seen as a tilded ladder. When you increase/decrease the base frequency, you shift all the rungs laterally, until you overlap with the initial setting (you’ve then moved accross an octave)

Perception of pitch shift and the proximity principle

As described earlier, Shepard tones have the property if you present 2 of them successively with a small upward or downward shift of the base frequency, you will non-ambiguously percieve pitch shifts in those direction.

Here is an example of a sequence of upward shifts:

Here a continunous analogous (called Shepard Risset)


These percepts can be explained with the help of a local proximity principle on the frequency domain (the global shift percept arising as a sum of local shifts).

This proximity principle also accounts for the ambiguously percieved half octave shift (extensively studied by Diana Deutsch) and that I will talk more about in a future post.

Epileptic Seizure Detection Challenge

Last Summer, the lab in which I do my PhD gathered as a team took part in a competition whose aim was to detect epileptic seizures from intracranial eeg (ieeg, also called ecog) data from dogs and humans.

The competition was launched jointly by Upenn and the Mayo Clinic (Pennsylvania) and hosted on the website Kaggle.

The competition

The Data

We were given multichannel recording of ieeg data from epileptic human and dogs. Data was split into 1s segments, labelled by name of the patient, manually whether it is a seizure or not (ictal or interictal), and how far from the seizure epoch onset the segment was.

iEEg electrode position

Example of signal (seizure onset)

The task

Given a 1s multichannel recording of ieeg data, can you tell (predict) wether it correspond to an ictal event or to quietness in between seizures (interictal event) and when within a seizure can you tell wether the data segment is early (close to onset, within the first 15s) in the seizure onset or late.

This correspond to a pair of classifications (binary assignment).

Evaluation of the predictions was the average of the two area under the ROC curve, a common measure of performance of binary predictors.

Why we took part in

Our lab pushes research both in research and machine learning. Mastering both fields is hard and takes time. By joining this competition the idea was to work collaboratively at the intersection of both fields and to learn from each other. It worked out pretty well!

Our team

Many Phd Students and PostDoc from our lab joined in in the context of a hackathon that started over a weekend (yes, we were motivated) and was pushed another one later on and a few days more close to the deadline. Some additional neuroscience and machine learning students joined in as well.

Our Strategy

We took the very standard approach to extract feature from our data and to train classifiers through cross validation. The design of interesting features was the scientific part, the choice and training of the classifier, along with all the required pipelining was the engineering part.

The Science

The question is the following: for a given patient what distinguishes the ieeg activity during and out of a seizure event. We focused on two types of features to distinguish ictal and interictal event: within and across channels.

  • Within a channel, there is a change of spectral energy (within the 1s)
  • Across channel there is a change in neural synchrony (channels get correlated during a seizure)

We use the following features (stacked together):

We used random forest classifiers, with 100 trees.

We evaluated the performance of our method through cross-validation.

The Engineering

We used python exclusively. The project required quite a heavy load of engineering which includes a parallelized loading of the data, pre-processing and extraction of features, the training of the predictors, the generation of the submission file. We mainly used the following libraries: Numpy, Scipy, Statsmodel, Scikit-learn. The collaboration was made easy using Github.

This is the part where I probably learned the most from machine learners and developers in the team: how to properly structure a machine learning project (with abstract classes before even starting implementation), how to build a pipeline from scratch, how to do parallelization properly and how to use scikit-learn.

The Result (and code)

After joyful days of collaboration and an intense final rush we finished 9 out or more than 200!

This was a very rewarding outcome but although it was a main drive, I see it now as secondary. I learned so much, it was really worth it.

The code is available here: (it does require some clean up and more doc to be usable. I’ll do that… if someone asks). It includes a report.


Participants were: Joana Soldado Magraner, Wittawat Jitkrittum, Gergo Bohner, Heiko Strathmann, Balaji lakshminarayanan, Alessandro Ialongo, Lea Goetz, Shaun Dowling, Julian Serban, Matthieu Louis, Ben Dongsung Huh, Zoltan Szabo, Laurence Aitchison.

Peter Dayan kindly supported the event by funding the food throughout the weekend. Thanks again!

3D print your own brain


About around a year ago, I 3D printed my own brain. Since then I carry it every single day on my keychain.

In this blog post I will share with you how I proceeded to do so.

While writing it, I tried again and it took me roughly 10 minutes to run everything and 10 minutes to get from my raw scan to an extracted surface in a standard format.

All you need is a computer running matlab and a 3D printer. Matlab is a commercial software. Other free solutions exist but most of them were more complicated to run (for me) at the time I checked. You might want to check FSL-Brain Extraction Tools

Step 1: Get your brain scanned.

As a master student in cognitive science in Paris, I was often asked to take part in fmri pilot studies by friends in need of free docile and patient subjects. One day I accepted and asked for my structural data (the 3D picture of the brain structure) which is always taken before doing the actual dynamical brain activity.

I wanted the ‘T1’ contrast that best separates white matter from grey matter.

In Paris, a good place to find experimentalists seeking subjects is here.

Step 2: Extract your brain surface

The output of a MRI structural scan is a 3D block (or pile of images) of your brain in grayscale (format was .img).

Raw data visualized with spm ‘Display’ tools


The segmentation step consists in isolating the brain from the rest (skull, skin etc). This step is called segmentation. It can be done separately for white and grey matter because the contrast between the two is high. The output is a grayscale image whose value correspond to the probability of being part of grey (resp white) matter. From this segmented data, you then want to extract a surface isolating your grey matter (a closed boundary for the class)

I used spm8 running on matlab to segment and extract the surface.

In matlab, after downloading spm:

  • click on ‘fmri’
  • in the ‘Menu’ window, click ‘Segment’
  • in the batch editor,
    • click on ‘Data’ and select your .img file
    • set ‘white matter’ to ‘none’
    • the press the ‘run’ button

by default, this will generate multiple segmentation outputs (.img files), you are interested in the one with prefix ‘cs1_’

  • back in the ‘Menu’ window, in the ‘SPM for functional MRI’, click on the scrolldown button ‘render …’ and select ‘Xtract Surface’, and finally choose the option ‘save extracted surface and rendering’

This step will output a 3D surface in the .gii format (a standard in the neuroimaging community)

As you will realize, there are a few artifacts (like blobs floating around, or suspicious appendices). You may want to customize a bit more the segmentation and extraction by looking at the large range of parameters available along the process.

Step 3: Convert into a printable format

The .obj format is a standard format for 3d objects and accepted by most 3D editors and 3D printers.

Spm comes with functions to extract edges and faces of .gii mesh objects. Then I used the function vertface2obj.m to generate an .obj file:

gg = gifti('mysurface.gii');
v = gg.vertices;
f = gg.faces;


Step 4: Edit in your favorite editor and print with your available printer

You can use the free blender to edit your file.

A friend of mine added a ring and printed my brain using a Makerbot Replicator 2.

If you don’t have a 3D printer, you could find people who do have one and are ready to share it through this community platform .I haven’t tried it but it looks great!



Printed Brains!

14/01/2015: my friend Marianne printed her brain!


Fear Potentiation of Gap Detection



This morning, I attended a Journal Club organised by Jennifer Linden’s Lab at ucl. The focus is on auditory neuroscience.

The article presented, published recently in the Journal of Neuroscience was:

Auditory cortex is required for fear potentiation of gap detection. by Aldis P. Weible, Christine Liu, Cristopher M. Niell and Michael Wehr

First, let’s decipher  the title:

  • Gap detection is the perceptual task of identifying the presence of a gap (silence or background reduction) in a sound. It is typically studied with 2 alternatives forced choice task (is there a gap or not ?) in humans. Gap detection is used as measure of temporal acuity: what is the typical size of events the auditory system can process. In animal studies, the gap is made predictive of some later sound evoking a behavioral response. Variation in the response is taken as a marker of detectability of the gap.
  • Fear conditioning is the process of associating a stimulus to an aversive outcome coming after it. After conditioning, the (now conditioned-)stimulus elicits the fear response, which used to come with the aversive outcome.
  • Fear potentiation refers to the added effect of associating an aversive value to a stimulus predictive of an initially non fearful outcome

The article describes a few variants of the following experiment, split in 3 periods that occur sequentially

  • Period 1: a mouse is presented short noise startles (or burts) both at random times or preceded by a 10ms gap in the background 50ms before the startle.

The startle typically evokes a little jump (or startle response), that is diminished (~75%) when in the presence of the gap.

  • Period 2: the mouse is presented  with gaps directly followed by an electrical shock (that it really hates). This is the fear association part
  • Period 3: is just similar to the Period 1 and presented a few hours after Period 2.

experimental paradigm

Here again the startle response is diminished when a gap precedes the startle. The main result is that it is more diminished than in the first period. This is a mark of the fear potentiation

What causes this further reduction (is it freezing induced by fear? enhanced gap salience?) is not discussed here.

The key question is to ask whether auditory cortex is required for this fear potentiation.

To answer this question, the authors inactivate the auditory cortex during the shock in Period 2. Inactivation is done using optogenetics (a set of techniques to reversibly, locally, selectively inactivate neurons in the brain).

They show that the auditory cortex is needed (Period 2 has no effect if the auditory cortex is deactivated).

A few points that were mentioned during the discussion:

  • this is surprising since fear conditioning is known to occur through other pathways (not necessarily going througth the auditory cortex). But this might depend on the complexity of the stimulus. These other faster pathways might be used for simple stimuli such as pure tones
  • It would have been nice to have additional information on the behavioral effect of the fear potentiation, which would have given more cues regarding what might be the cause of the observed effect

Auditory Motion After Effect

As a first blog post, I want to summarize two short readings about an auditory perceptual effect that could be called “auditory motion after effect

It is an auditory equivalent to the visual motion after effect.

Background: Visual Motion After Effect

Let’s start with an example: If you stare at a waterfall for a while and then look at a static object, you will percieve the static object as moving upwards:

More generally, any coherent enough and sustained enough motion in your visual field will induce a percept of motion in the reversed direction once motion is stopped.

Although I don’t know what is the last word of experts on this effect, there are two main types of  explanations

  • low level adaptation of local motion detector
  • high level motion expectation

Auditory Motion After effect

Now what would be an equivalent of such a motion after effect in the auditory domain? A direct equivalent is not an option since sounds are 1D signals

To find such an equivalent, we have to consider sound representations and use motion in the space sounds are represented in. Two alternatives naturally come to mind

  • Spectrogram: A both formal and biological (intermediate) representation of sound is through an spectrogram, that is a time-frequency map of sound energy. A spectrogram is a 2D object, just as an image, with time as an axis. Motion can be defined as continuous variation of local spectral energy.
  • Source localization: Humans are able to locate sound in the world, and perceive motion of auditory objects. So motion of objects in the physical world could also be used (and this actually works [1])

For the rest of this post, I will focus on motion in the time-frequency domain

I will briefly summarize the following two papers:

  1. Spectral motion produces an auditory after-effect, Shu et al, Nature 93
  2. Frequency-change aftereffect produced by adaptation to real and illusory unidirectional frequency sweep, Masutomi et al, JASA 2013

Spectral motion produces an auditory after-effect

The aim is to induce adaptation to up/downward motion of a spectral pattern.

A moving stimulus is presented for a prolonged duration, then a test stimulus is presented and subjects are asked to judge its motion. The key idea is that the motion percept of the test stimulus is distorted (biased) by the earlier exposure.

Adapation stimuli:  A spectral pattern are shifted in time along the frequency axis, following a saw tooth trajectory (imagine in the spectrogram) around a centre frequency and with constant speed

Test stimuli: short spectral pattern with constant motion, with trajectory sharing the same centre frequency as the adaptation stimulus


  • Two different spectral patterns :
    • S1: narrow band noise  (in a slightly broader brand, weaker bed of noise )
    • S2: notched noise ( narrow band noise, with a gap in noise)

The choice of those stimuli sounds quite arbitrary, but results are quite interesting.

Both stimuli share a fixed background noise, only the dip or peak in this background is moved in time.


The desired effect in indeed found. Adaptation bias the perception of motion. This can be seen on this plot reporting the subjective mean velocity of the test stimuli that was percieved as stationary as a function of the actual speed of the adapting stimulus.

What is surprising is the fact that this only works if adaptation and test stimuli are built with the same pattern. This is surprising because, roughly, the two patterns produce a similar excitation to the auditory system. The absence of effect if the test stimulus is not the one that was used for adaptation suggest that adaptation is sensitive to the detailed structure of the spectral pattern.

Another interesting result of this study is that the effect is preserved if you play the adaptation stimulus on one ear and the test stimulus on the other, which suggest that the effect must arise at a stage of the feedforward auditory processing after the merging of the information from both ears.

Frequency-change aftereffect produced by adaptation to real and illusory unidirectional frequency sweep

Illusory continuity is the perception of continuity in time of a perceptual object despite its physical disappearance during a short period of time.

This is easily demonstrating by playing a frequency sweep and replacing a short portion of it by a noise burst. Subject percieve the sweep as continuing rather than two different sweeps. This also works with stimuli in motion. which mean that there is an illusory percept of motion during the gap

The question asked here is the following: can we adapt to illusory motion or do the motion need to be physical (actually in the stimulus)

In brief the methodology of the study is similar to the previous one with the difference here that spectral patterns are simpler: just pure tones (i.e, sounds are simple sweeps).

Additional conditions compared to the previous study are

  • real continuity (as in previous study)
  • illusory continuity (with noise burst during the motion of the adaptation stimulus)
  • no continuity (actual gaps in the motion instead of the noise, this breaks the illusion of continuity)

The result of the study is that only real continuity leads to a motion after-effect.

This suggests that adaptation occurs based on the physical property of the stimuli (possibly frequency shift detector) rather than on a perceptually reconstructed continuity prior to the adaptation, which were the two hypotheses opposed by the authors of the study.


Taken together, these studies show the possibility of auditory motion after effect.

The question as to what are the cause of this after effect is still opened. Indeed, the first study suggest a dependency on the detailed structure of the stimuli (stimulus specific adaptation?) which suggest a more complex mechanism than a mere local spectral energy adaptation.

On the other hand, higher cognitive percepts such as the continuity illusion seem not to be involved in the auditory after effect

The interaural transfer of the effect related in the first study suggest an intermediate level of processing.

[1] Adaptation to auditory motion in the horizontal plane: Effect of prior exposure to motion on motion detectability, Grantham DW., Percept Psychophys. 1992