This program provides the necessary knowledge for the processing, retrieving and generating audio signals including the specific applications to music, speech or environmental sounds signals. It covers
- audio signal processing (Fourier Transform, Short-Time-Fourier-Transform, Constant-Q-transform, Cesptrum, MFCC, Sinsuoidal model)
- speech production (source-Filter model, phonemes), sound perception (phon/sone scale, critical bands), music theory (pitch, chords, rhythm, structure)
- standard pattern-matching and machine-learning models for time-series (DTW, HMM)
- deep learning specificities for audio processing (WaveNet, SincNet, DDSP, TCN, VAE/VQ-VAE, RVQ, GAN, DIffusion, ...)
From theory ... to practice ... to industry Each session is organized as a 40\% lecture, 40\%lab(*), 20\% industry talk
- It starts with a lecture which provides the necessary knowledge for the development of a typical audio application done during the Lab.
- During labs, students learn to implement the content of the lecture using the currently most popular tools (librosa, pytorch, keras, ...)
- Such applications are: audio denoising, time-stretching, audio source separation, audio segmentation (speech/ music), audio recognition (environnemental sounds, acoustic scene classification, musical genre multi-label), cover detection or auto-tagging (into genre, mood), estimation of specific music attributes (multi-pitch, tempo/beat, chord, structure), music identification by fingerprint (Shazam), ...
- The session ends with an industry talks whioch allow student to understand how these technologies are used in industrial products or services.
- In previous years we had talks from Meta-AI, Adobe-Research, Deezer, Pandora-Music, SonyCSL, Universal-Music-Group, Utopia, Audio-Shake, Chordify and others