Previous: Machine Learning, Up: Machine Learning


4.5.1 kea

The previous version of Marsyas 0.1 contained machine learning functionality but until 2007 the new version 0.2 mostly relied on Weka for machine learning experiments. Although this situation was satisfactory for writing papers it was not possible to create real-time networks integrating machine learning. Therefore an effort was made to establish programming conventions for how machine learning MarSystems should be implemented. Last but not least we have always wanted to have as much functionality related to audio processing systems implemented natively in Marsyas.

kea is one of the outcomes of this effort. Kea (a rare bird from New Zealand) is the Marsyas counterpart of Weka and provides similar capabilities with the command-line interface to Weka although much more limited (at least for now).

Any weka .arff file can be used as input to kea although ususally the input is the extracted .arff files from bextract. The following command-line options are supported.

-m --mode
specifies the mode of operation (train, distance_matrix, pca). The default mode is train.
-cl --classifier
the type of classifier to use if mode is train Available classifiers are GS, ZEROR, SVM
-w --wekefile
the name of the weka file
-id --inputdir
input directory
-od --outputdir
output directory
-dm --distance_matrix
filename for the distance matrix output if mode is distance_matrix

The main mode (train) basically performs 10-fold non-stratified cross-validation to evaluate the classification performance of the specified classifier on the provided .arff file. In addition to classification accuracy It outputs several other summary measures of the classifier's performance as well as the confusion matrix. The format of the output is similar to Weka.

The mode distance_matrix is used to compute a NxN similarity matrix based on the input .arff file containing N feature vector instances. The output format is the one used for MIREX 2007 music similarity task. This functionality relies on specific naming conventions related to the Marsyas MIREX2007 submission. By default the output goes to dm.txt but can be specified by the -dm command-line option. The following examples show different ways kea can be used.

The pca mode reduces the input feature vectors by projecting them to the first 3 principal components using Principal Component Analysis (PCA). Each component is normalized to lie in the range [0-512]. The resulting transformed features are simply written to stdout.

     kea -w iris.arff
     kea -m train -w iris.arff -cl SVM
     kea -m distance_matrix -dm dmatrix.txt -w iris.arff
     kea -m pca -w iris.arff