.. -*- mode: rst; fill-column: 78; indent-tabs-mode: nil -*-
.. vi: set ft=rst sts=4 ts=4 sw=4 et tw=79:
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###
  #
  #   See COPYING file distributed along with the PyMVPA package for the
  #   copyright and license terms.
  #
  ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ###


.. index:: Feature Selection
.. _chap_featsel:

*****************
Feature Selection
*****************


*This section has been originally contributed by James M. Hughes.*

It is often the case in machine learning problems that we wish to reduce a
feature space of high dimensionality into something more manageable by
selecting only those features that contribute most to classification
performance.  Feature selection methods attempt to achieve this goal in an
algorithmic fashion.  This section is a complement to Tutorial sections
:ref:`chap_tutorial_classifiers` and :ref:`chap_tutorial_sensitivity` aiming
to also cover additional topics, such as RFE_.

.. index:: FeatureSelectionClassifier

PyMVPA's flexible framework allows various feature selection methods to take
place within a small block of code.
:class:`~mvpa2.clfs.meta.FeatureSelectionClassifier` (which in mvpa2 is just a
wrapper around :class:`~mvpa2.clfs.meta.MappedClassifier`) extends the
basic classifier framework to allow for the use of arbitrary methods of feature
selection according to whatever ranking metric, feature selection criteria, and
stopping criterion the user chooses for a given application.  Examples of the
code/classification algorithms presented here can be found in
:mod:`~mvpa2.clfs.warehouse`.

More formally, a :class:`~mvpa2.clfs.meta.FeatureSelectionClassifier` is a meta-classifier.  That is, it
is not a classifier itself -- it can take any *slave*
:class:`~mvpa2.clfs.base.Classifier` and a
:class:`~mvpa2.featsel.base.FeatureSelection`.  When trained on a dataset, it first performs the
`feature_selection`, and then trains the
provided *slave* `clf` on those selected features.  Externally, however, it looks
like a :class:`~mvpa2.clfs.base.Classifier`, in that it fulfills the specialization of the Classifier
base class.  The following are the relevant arguments to the constructor of
such a :class:`~mvpa2.clfs.base.Classifier`:

`clf`: :class:`~mvpa2.clfs.base.Classifier`
  classifier based on which mask classifiers is created

`feature_selection`: FeatureSelection_
  whatever feature selection is considered best

.. index:: FeatureSelection

Let us turn out attention to the second argument, FeatureSelection_. As noted
above, this feature selection can be arbitrary and should be chosen
appropriately for the task at hand.  For example, we could perform a one-way
ANOVA statistic to select features, then keep only the most important 5% of
them.  It is crucial to note that, in PyMVPA, the way in which features are
selected (in this example by keeping only 5% of them) is wholly independent of
the way features are ranked (in this example, by using a one-way ANOVA).
Feature selection using this method could be accomplished using the following
code (from `mvpa2/clfs/warehouse.py`_):

  >>> from mvpa2.suite import *
  >>> fs = SensitivityBasedFeatureSelection(
  ...     OneWayAnova(),
  ...     FractionTailSelector(0.05, mode='select', tail='upper'))

A more interesting analysis is one in which we use the weights (hyperplane
coefficients) to rank features.  This allows us to use the same classifier to
train the selected features as we used to select them:

.. here we'll put the warehouse.py example of linear svm weights from yarik's
   email

  >>> sample_linear_svm = clfswh['linear', 'svm'][0]
  >>> clf = \
  ...  FeatureSelectionClassifier(
  ...      sample_linear_svm,
  ...      SensitivityBasedFeatureSelection(
  ...         sample_linear_svm.get_sensitivity_analyzer(postproc=maxofabs_sample()),
  ...         FractionTailSelector(0.05, mode='select', tail='upper')),
  ...      descr="LinSVM on 5%(SVM)")
  >>> print clf
  <LinSVM on 5%(SVM)>

It bears mentioning at this point that caution must be exercised when selecting
features.  The process of feature selection must be performed on an independent
training dataset:  it would be incorrect to select features using the entire
dataset, re-train a classifier on a subset of the original data (but using only
the selected features) and then test on a held-out testing dataset.  This
results in an obvious positive bias in classification performance (see :ref:`chap_magic_feature_selection`).  PyMVPA
allows for easy dataset partitioning and splitting, however, so creating independent training
and testing datasets is easily accomplished.

.. fill in end of last paragraph with suggestions for how to take in an entire
   original dataset and split it:  should we just do a cross-validated outer
   loop that uses multiple training/testing splits and does RFE on each of
   these splits?



.. index:: recursive feature selection, RFE
.. _recursive_feature_elimination:

Recursive Feature Elimination
=============================

Recursive feature elimination
(RFE_, applied to fMRI data using SVM in (:ref:`Hanson et al., 2008 <HH08>`))
is a technique that falls under the larger
umbrella of feature selection. Recursive feature elimination specifically
attempts to reduce the number of selected features used for classification in
the following way:

* An arbitrary feature ranking/weighting (usually classifier's
  sensitivity/weights) is performed on a subset of the data.

* Some amount of those features is either selected or discarded according to a
  pre-selected rule.

* Above steps are repeated until some criterion determined \textit{a priori}
  (such as classification error) is reached.

* One or more classifiers trained only on the final set of selected features
  are used on a generalization dataset and performance is calculated.

PyMVPA's flexible framework allows each of these steps to take place within a
small block of code.  Flexible parametrization allows to explore quite a
variety of possible specific RFE_ recipes. To actually perform recursive
feature elimination, we consider two separate analysis scenarios that deal
with a pre-selected training dataset:

* We split the training dataset into an arbitrary number of independent
  datasets and perform RFE on each of these; the sensitivity analysis of
  features is performed independently for each split and features are selected
  based on those independent measures.

* We split the training dataset into an arbitrary number of independent
  datasets (as before), but we average the feature sensitivities and select
  which features to prune/select based on that one average measure.

.. index:: SplitClassifier

We will concentrate on the second approach.  The following code can be used to
perform such an analysis:

  >>> rfesvm_split = SplitClassifier(LinearCSVMC(), OddEvenPartitioner())
  >>> # design an RFE feature selection to be used with a classifier
  >>> rfe = RFE(rfesvm_split.get_sensitivity_analyzer(
  ...           # take sensitivities per each split, L2 norm, mean, abs them
  ...           postproc=ChainMapper([ FxMapper('features', l2_normed),
  ...                                  FxMapper('samples', np.mean),
  ...                                  FxMapper('samples', np.abs)])),
  ...           # use the error stored in the confusion matrix of split classifier
  ...           ConfusionBasedError(rfesvm_split, confusion_state='stats'),
  ...           # we just extract error from confusion, so need to split dataset
  ...           Repeater(2),
  ...           # select 50% of the best on each step
  ...           fselector=FractionTailSelector(
  ...               0.50,
  ...               mode='select', tail='upper'),
  ...           # and stop whenever error didn't improve for up to 10 steps
  ...           stopping_criterion=NBackHistoryStopCrit(BestDetector(), 10),
  ...           # we just extract it from existing confusion
  ...           train_pmeasure=False,
  ...           # but we do want to update sensitivities on each step
  ...           update_sensitivity=True,
  ...           enable_ca=['errors'])
  >>> clf = \
  ...  FeatureSelectionClassifier(
  ...   LinearCSVMC(),
  ...   # on features selected via RFE
  ...   rfe,
  ...   # custom description
  ...   descr='LinSVM+RFE(splits_avg)' )
  >>> # Just an example of application
  >>> from mvpa2.misc.data_generators import normal_feature_dataset
  >>> ds = normal_feature_dataset(perlabel=20, nlabels=2, nfeatures=100, snr=0.5)
  >>> clf.train(ds)
  >>> # See how errors go down and possibly back up while removing the features
  >>> print zip(rfe.ca.nfeatures, rfe.ca.errors)   # doctest: +SKIP

The code above introduces the :class:`~mvpa2.clfs.meta.SplitClassifier`, which in this case is yet
another *meta-classifier* that takes in a :class:`~mvpa2.clfs.base.Classifier` (in this case a
LinearCSVMC_) and an arbitrary :class:`~mvpa2.generators.partition.Partitioner` object, so that the dataset can be
partitioned to be split in whatever way the user desires.  Prior to training, the
:class:`~mvpa2.clfs.meta.SplitClassifier` splits the training dataset, dedicates a separate classifier
to each split, trains each on the training part of the split, and then computes
transfer error on the testing part of the split. If a :class:`~mvpa2.clfs.meta.SplitClassifier` instance
is later on asked to *predict* some new data, it uses (by default) the
MaximalVote_ strategy to derive an answer.  A summary about the performance of
a :class:`~mvpa2.clfs.meta.SplitClassifier` internally on each split of the training dataset is
available by accessing the `stats` conditional attribute.

To summarize somewhat, RFE_ is just one method of feature selection, so we use a
:class:`~mvpa2.clfs.meta.FeatureSelectionClassifier` to facilitate this.  To parameterize the RFE
process, we refer above to the following:

`fmeasure`
  in this case just the default from a linear C-SVM (the SVM weights), taken as
  an average over all splits (in accordance with scenario 2 as above)

`pmeasure`
  confusion-based error that relies on the confusion matrices computed during
  splitting of the dataset by the :class:`~mvpa2.clfs.meta.SplitClassifier`; this is used to provide a
  value that can be compared against a stopping criterion to stop eliminating
  features

`splitter`
  defines how to split the dataset into two parts: one for training
  `fmeasure` and another one for assessing `pmeasure`.  In our case we rely on
  `SplitClassifier` to provide both, so no splitting is necessary so we just
  repeat dataset twice

`fselector`
  in this example we simply discard the 50% of features deemed least important

`update_sensitivity`
  true to retrain the classifiers each time we eliminate features; should be
  false if a non-classifier-based sensitivity measure (such as one-way ANOVA)
  is used

As has been shown, recursive feature elimination is an easy-to-implement,
flexible, and powerful tool within the PyMVPA framework.  Various ranking
methods for selecting features have been discussed.  Additionally, several
analysis scenarios have been presented, along with enough requisite knowledge
that the user can plug in whatever classifiers, error metrics, or sensitivity
measures are most appropriate for the task at hand.

.. _RFE: generated/mvpa2.featsel.rfe.RFE.html

.. _MaximalVote: api/mvpa2.clfs.meta.MaximalVote-class.html
.. _FeatureSelection: api/mvpa2.featsel.base.FeatureSelection-class.html
.. _LinearCSVMC: api/mvpa2.clfs.svm.LinearCSVMC-class.html
.. _mvpa2/clfs/warehouse.py: api/mvpa2.clfs.warehouse-pysrc.html


.. index:: incremental feature search, IFS
.. _incremental_feature_search:

Incremental Feature Search
==========================

IFS_

(to be written)

.. _IFS: api/mvpa2.featsel.ifs.IFS-class.html

.. What are the practical differences (besides speed) between RFE and IFS?

.. automodule:: mvpa2.measures

.. only:: html

   Related API documentation
   =========================

   .. currentmodule:: mvpa
   .. autosummary::
      :toctree: generated

      featsel.base
      featsel.ifs
      featsel.rfe
      featsel.helpers



