Next: , Previous: pitchextract, Up: Feature Extraction


4.3.2 bextract

bextract is one of the most powerful executables provided by Marsyas. It can be used for complete feature extraction and classification experiments with multiple files. It serves as a canonical example of how audio analysis algorithms can be expressed in the framework. This documentation refers to the latest refactored version of bextract. The old-style bextract using the -e command-line option to specify the feature extractor is still supported but use of it discouraged.

Suppose that you want to build a real-time music/speech descriminator based on a collection of music files named music.mf and a collection of speech files named speech.mf. These collections can either be created manually or using the mkcollection utility. The following commandline will extract means and variances of timbral features (time-domain Zero-Crossings, Spectral Centroid, Rolloff, Flux and Mel-Frequency Cepstral Coefficients (MFCC) over a texture window of 1 sec.

     bextract music.mf speech.mf -w ms.arff -p ms.mpl -cl GS
     bextract ms.mf -w ms.arff -p ms.mpl
     bextract -mfcc classical.mf jazz.mf rock.mf -w genre.arff

The first two commands are equivalent assuming that ms.mf is a labeled collection with the same files as music.mf and speech.mf. The third-command specifies that only the MFCC features should be extracted and is an example of classifying three classes.

The results are stored in a ms.arff which is a text file storing the feature values that can be used in the Weka machine learning environment for experimentation with different classifiers. After a header describing the features (attribute in Weka terminology) it consists of lines of comma separated feature values. Each line corresponds to a feature vector. The attributes in the generated .arff file have long descriptive names that show the process used to calculate the attribute. In order to associate filenames and the subsequences of feature vectors corresponding to them each subsequence corresponding to a file is prefixed by the filename as a comment in the .arff file. It is a text file that is straighforward to parse. Viewing it in a text editor will make this clearer.

In addition to Weka, the native Marsyas kea tool kea can be used to perform evaluations (cross-validation, accuracies, confusion matrices) similar to Weka although with more limited functionality.

At the same time that the features are extracted, a classifier (in the example above a simple Naive Bayes classifier (or Gaussian)) is trained and when feature extraction is completed the whole network of feature extraction and classification is stored and can be used for real-time audio classification directly as a Marsyas plugin stored in ms.mpl.

The resulting plugin makes a classification decision every 20ms but aggregates the results by majority voting (using the Confidence MarSystem) to display time-stamped output approximately every 1 second. The whole network is stored in ms.mpl which is loaded into sfplugin and file_to_be_classified is played and classified at the same time. The screen output shows the classification results and confidence. The second command shows that the live run-time classification can be integrated with bextract. In both cases collections can be used instead of single files.

     sfplugin -p ms.mpl music_file_to_be_classifed.wav
     sfplugin -p ms.mpl speech_file_to_be_classifed.wav
     bextract -e ms.mf -tc file_to_classified.wav
     bextract -e ms.mf -tc collection_to_classified.wav

Using the command-line option -sv turns on single vector feature extraction where one feature vector is extracted per file. The single-vector feature representation is useful for many Music Information Retrieval tasks (MIR) such as genre classification, similarity retrieval, and visualization of music collections. The following command can be used to generate a weka file for genre classification with one vector per file.

     ./bextract -sv cl.mf ja.mf ro.mf -w genres.arff -p genres.mpl

The resulting genres.arff file has only one feature vector line for each soundfile in the collections. In this case where no -cl command-line argument is specified a linear Support Vector Machine (SVM) classifier is used instead.

Feature sets refer to collections of features that can be included in the feature extraction. It includes several individual feature sets proposed in the MIR and audio analysis literature as well as some common combinations of them. (for details and the most updated list of supported sets experienced users can consult the selectFeatureSet() function in bextract.cpp). The feature sets can be separated into three lagre groups depending what front-end is used: time-domain, spectral-domain, lpc-based.

The following feature sets are supported (for definitions consult the MIR literature, check the corresponding code implementations and send us email with question for details you don't understand) :

-timbral --TimbralFeatures
Time ZeroCrossings, Spectral Centroid, Flux and Rolloff, and Mel-Frequency Cepstral Coefficients (MFCC). Equivalent to -mfcc -zcrs -ctd -rlf -flx. This also the default extracted feature set.
-spfe --SpectralFeatures
Spectral Centroid, Flux and Rolloff. Equivalent to -zcrs -ctd -rlf -flx.
-mfcc --MelFrequencyCepstralCoefficients
Mel-Frequency Cepstral Coefficients.
-chroma --Chroma
-ctd --SpectralCentroid
-rlf -- SpectralCentroid
-flx --SpectralFlux
-zcrs --ZeroCrossings
-sfm --SpectralFlatnessMeasure
-scf --SpectralCrestFactor
-lsp --LineSpectralPair
-lpcc --LinearPredictionCepstralCoefficients

By default stereo files are donwmixed to mono by summing the two channels before extracting features. However, bextract also supports the extraction of feature based on stereo information. There are feature sets that can only be extracted from stereo files. In addition it is possible to use any of the feature sets described above and extract features for both left and right channels that are concatenated to form a feature vector.

-spsf --StereoPanningSpectrumFeatures
-st --stereo
Calculate whatever feature sets are activated for both left and right channels.

For example the first command below calculates MFCC for both left and right channels. The second command calculates the Stereo Panning Spectrum Features which require both channels and also the Spectral Centroid for both left and right.

     bextract -st -mfcc mymusic.mf -w mymusic.arff
     bextract -spsf -st --SpectralCentroid -w mymusic.arff

The feature extraction can be configured in many ways (only some of which are possible through command-line options). The following options can be used to control various aspects of the feature extraction process (most of the default values assume 22050 Hz sampling rate):

-c --collection
the collection of files to be used
-s --start
starting offset (in seconds) into each soundifle from which features will be extracted
-l --length
length (in seconds) of each soundfile from which features will be extracted. A length of -1.0 indicates that the entire duration of the file should be used (the default behavior)
-n --normalization
apply normalization to audio signal
-fe --featExtract
only extract features without training the classifier
-st --stereo
use stereo feature extraction
-ds --downsample
downsample factor (default 1)
-ws --winsamples
size in samples of the analysis window (default 512)
-hp --hopsamples
size in samples of the hop analysis size (default 512 - no overlap)
-as --accSize
size in analysis frames of how many feature vectors are summarized when single vectors per file are calculated (default 1298 - approximately 30 seconds)
-m --memory
size in analysis frames of how many features vectors are summarized for each texture window (default 40 - approximately 1 second)
-cl --classifier
classifier used for training and prediction (default GS - a simple Naive Bayes Classifier)
-e --extractor
old-style specification of feature extraction maintained for backward compatibility (usage discouraged)
-p --plugin
filename of generated Marsyas plugin (.mpl file)
-w --wekafile
filename of generated .arff file (for Weka or kea)
-tc --test
filename of collection or soundfile used for prediction after a model is trained (can be used to conduct MIREX style experimetns)
-pr --predict
filename of a collection or soundfile used for prediction after a model is trained
-wd --workdir
Directory where all generated files will be written by defautl the current directory is used
TimeLines

bextract also supports a mode, called the Timeline mode that allows labeling of different sections of an audio recording with different labels. For example, you might have a number of audio files of Orca recordings with sections of voiceover, background noise, and orca calls. You could train a classifier to recognize each of these types of signal. Instead of a label associated with each file in the collection there is an associate Marsyas timeline file (the format is described below). To run bextract in Timeline mode, there are two steps: training and classifier:

     bextract -t songs.mf -p out.mpl -pm
     
     Where:
     
     -t songs.mf - A collection file with a song name and its
     corresponding .mtl (Marsyas Timeline) file on each line
     
     -p out.mpl  - The Marsyas Plugin to be generated
     
     -pm         - Mute the output plugin

and predicting labels for a new audio recording

        sfplugin -p out.mpl songmono.wav
     
     Where:
     
     -p out.mpl   - The plugin output by bextract in step #1

The songs.mf file is Marsyas collection file with the path to song (usually .wav) files and their corresponding Marsyas Timeline (.mtl) files on each lines. Here is an example song.mf file:

       /path/to/song1.wav \t /path/to/song1.mtl
       /path/to/song2.wav \t /path/to/song2.mtl
       /path/to/song3.wav \t /path/to/song3.mtl

Please note that the separator character \t must be an actual tab, it cannot be any other kind of whitespace.

The .mtl format has three header lines, followed by blocks of 4 lines for each annotated section. The format is:

       HEADER:
       -------
     
       number of regions
       line size (=1)
       total size (samples)
     
       FOR EACH SAMPLE:
       ----------------
       start (samples)
       classId (mrs_natural)
       end (samples)
       name (mrs_string)

For example:

  3
  1
  2758127
  0
  0
  800000
  voiceover
  800001
  1
  1277761
  orca
  1277762
  2
  2758127
  background

Because the .mtl file is kind of obtuse, we have written a small Ruby program to convert Audacity label files to .mtl format. This script can be found at marsyas/scripts/generate-mtl.rb. The script is currently hardcoded to recognize the chord changes from songs from the annotated Beatles archive, but you can easily change this by modifying the "chords_array" variable.