4.6.2 peakClustering
peakClustering performs sinusoidal analysis
followed by a spectral
clustering
for grouping spectral peaks that operates across time and
frequency over “texture” windows. It can be used for predominant
melody separation on polyphonic audio signals. More technical
information can be found in the paper “Normalized Cuts for Predominant
Melodic Source Separation” published in the IEEE Transactions on Audio,
Speech and Language processing. Examples of usage:
peakClustering foo.wav
This will result in a file named fooSep.wav that contains the separated
predominant melody source such as a singing voice in a rock song or a
saxophone line in a jazz tune.
- ‘-n --fftsize’
- size of fft
- ‘-w --winsize’
- size of window
- ‘-s --sinusoids’
- number of sinusoids per frame
- ‘-b --buffersize’
- audio buffer size
- ‘-o --outputdirectoryname’
- output directory path
- ‘-N --noisename’
- name of degrading audio file
- ‘-p --panning’
- panning informations <foreground level (0..1)>-<foreground pan (-1..1)>-<background level>-<background pan>
- ‘-t --typeSimilarity’
- similarity information a (amplitude) f (frequency) h (harmonicity)
- ‘-q -quitAnalyse’
- quit processing after specified number f seconds
- ‘-T --textureSize’
- number of frames in a texture window
- ‘-c -clustering’
- number of clusters in a texture window
- ‘-v --voices’
- number of voices
- ‘-F --clusterFiltering’
- cluster filtering
- ‘-A --attributes’
- set attributes
- ‘-g --ground’
- set ground
- ‘-SC --clusterSynthetize’
- cluster synthetize
- ‘-P --peakStore’
- set peak store
- ‘-k -keep’
- keep the specified number of clusters in the texture window
- ‘-S --synthetise’
- synthetize using an oscillator bank (0), an IFFT mono (1), or an IFFT stereo (2)
- ‘-r --residual’
- output the residual sound (if the synthesis stage is selected)
- ‘-i --intervalFrequency’
- <minFrequency>_<maxFrequency> select peaks in this interval (default 250-2500 Hz)
- ‘-f --fileInfo’
- provide clustering parameters in the output name (s20t10i250_2500c2k1uTabfbho means 20 sines per frames in the 250_2500 Hz frequency Interval, 1 cluster selected among 2 in one texture window of 10 frames, no precise parameter estimation and using a combination of similarities abfbho)
- ‘-npp --noPeakPicking’
- do not perform peak picking in the spectrum
- ‘-u --unprecise’
- do not perform precise estimation of sinusoidal parameters
- ‘-if --ignoreFrequency’
- ignore frequency similarity between peaks
- ‘-ia --ignoreAmplitude’
- ignore amplitude similarity between peaks
- ‘-ih --ignoreHWPS’
- ignore harmonicity (HWPS) similarity between peaks
- ‘-ip --ignorePan’
- ignore panning similarity between peaks
- ‘-uo --useOnsets’
- use onset detector for dynamically adjusting the length of texture windows