Previous: peakSynth, Up: Auditory Scene Analysis


4.6.2 peakClustering

peakClustering performs sinusoidal analysis followed by a spectral clustering for grouping spectral peaks that operates across time and frequency over “texture” windows. It can be used for predominant melody separation on polyphonic audio signals. More technical information can be found in the paper “Normalized Cuts for Predominant Melodic Source Separation” published in the IEEE Transactions on Audio, Speech and Language processing. Examples of usage:

     peakClustering foo.wav

This will result in a file named fooSep.wav that contains the separated predominant melody source such as a singing voice in a rock song or a saxophone line in a jazz tune.

-n --fftsize
size of fft
-w --winsize
size of window
-s --sinusoids
number of sinusoids per frame
-b --buffersize
audio buffer size
-o --outputdirectoryname
output directory path
-N --noisename
name of degrading audio file
-p --panning
panning informations <foreground level (0..1)>-<foreground pan (-1..1)>-<background level>-<background pan>
-t --typeSimilarity
similarity information a (amplitude) f (frequency) h (harmonicity)
-q -quitAnalyse
quit processing after specified number f seconds
-T --textureSize
number of frames in a texture window
-c -clustering
number of clusters in a texture window
-v --voices
number of voices
-F --clusterFiltering
cluster filtering
-A --attributes
set attributes
-g --ground
set ground
-SC --clusterSynthetize
cluster synthetize
-P --peakStore
set peak store
-k -keep
keep the specified number of clusters in the texture window
-S --synthetise
synthetize using an oscillator bank (0), an IFFT mono (1), or an IFFT stereo (2)
-r --residual
output the residual sound (if the synthesis stage is selected)
-i --intervalFrequency
<minFrequency>_<maxFrequency> select peaks in this interval (default 250-2500 Hz)
-f --fileInfo
provide clustering parameters in the output name (s20t10i250_2500c2k1uTabfbho means 20 sines per frames in the 250_2500 Hz frequency Interval, 1 cluster selected among 2 in one texture window of 10 frames, no precise parameter estimation and using a combination of similarities abfbho)
-npp --noPeakPicking
do not perform peak picking in the spectrum
-u --unprecise
do not perform precise estimation of sinusoidal parameters
-if --ignoreFrequency
ignore frequency similarity between peaks
-ia --ignoreAmplitude
ignore amplitude similarity between peaks
-ih --ignoreHWPS
ignore harmonicity (HWPS) similarity between peaks
-ip --ignorePan
ignore panning similarity between peaks
-uo --useOnsets
use onset detector for dynamically adjusting the length of texture windows