Dimensionality Reduction via Regression (DRR) is a manifold learning technique intended to remove the residual statistical dependence between PCA components due to the curvature in the dataset. DRR is based on the prediction of PCA coefficients from neighbor coefficients using multivariate regression (hence generalizing PPA). DRR is a computationallly convenient step forward to SPCA in the family of manifold learning techniques generalizing PCA through the use of curves instead of stright lines (that includes NL-PCA, PPA, and SPCA).
Principal Polynomial Analysis (PPA) is a manifold learning technique based on the use of a sequence of principal polynomials that capture the eventually nonlinear nature of the data, hence generalizing PCA by admitting curves instead of straight lines. PPA follows the Sequential Principal Curves Analysis philosophy (invertible projection onto a set of curvilinear festures, see below) but using a computationally convenient (univariate) analytical form for the curves and a dimensionality reduction goal function. PPA improves the PCA energy compaction ability thus reducing its dimensionality reduction error. PPA defines a manifold-dependent metric that generalizes Mahalanobis distance for curved manifolds, and allows to compute generalized curvatures at any point.
Toolbox of linear and kernel feature extraction: (1) Linear methods: PCA, MNF, CCA, PLS, OPLS, and (2) Kernel feature extractors: KPCA, KMNF, KCCA, KPLS, KOPLS and KECA.
Last version of the toolbox is in GitHub: https://github.com/IPL-UV/simFeat.
RBIG is an invertible multivariate Gaussianization transform based on the repeated application of univariate histogram Gaussianization and arbitrary multivariate rotation. By reversing the order of the linear+nonlinear stages in Deep Networks and Projection Pursuit, we show that even inexpensive random rotations lead to the Gaussian goal. RBIG can be used for multivariate PDF estimation and all the associated applications.
SPCA is an invertible manifold learning technique based on the use of a sequence of nonparametric principal curves that capture the eventually nonlinear nature of the data, hence generalizing PCA by admitting curves instead of straight lines. On top of the unfolding associated to the projection onto the Principal Curves, the metric along the curves is set according to the local PDF thus leading to multivariate histogram equalization fulfilling either NonLinear ICA (information maximization) or optimal Vector Quantization (error minimization).
Exploitation of probabilistic cluster kernels for nonlinear feature extraction with kernels. The kernel function is built from data, and outperforms standard kernel functions, as well as information theoretic kernels such as the Fisher kernels and mutual information kernels.
Higher Order Canonical Analysis (HOCCA) is a linear invertible manifold learning technique to be applied on several data sets coming from the same underlying source (the domain adaptation problem): It finds in each data set independent components which are related across the data sets, thus conciliating the goals of ICA and CCA at the same time. The canonical domain found by HOCCA maximizes the independence in the internal domain (similarly to separate ICAs) and maximizes the correspondence between the features in each aquisition condition (similarly to CCA).
The SSMA Toolbox is a Matlab Toolbox for the semisupervised manifold alignment of generic data without the need of having corresponding pairs, just a set of (few) labeled samples in each domain.
Optimized kernel feature extraction based on entropy estimation in Hilbert feature spaces. Outstanding results are obtained with much less components: OKECA is intrinsically sparse and compact, very useful for data visualization and dimensionality reduction.
The kernel signal to noise ratio (KSNR) considers a feature extraction model that maximizes the signal variance while minimizes the estimated noise variance in a reproducing kernel Hilbert space (RKHS). The KSNR can be used for dimensionality reduction as an excellent alternative to kPCA when dealing with correlated (possibly non-Gaussian) noise. KSNR yields more fitted solutions and extracts more noise-free features when confronted with standard approaches.
EPLS stands for Enhancing Population and Lifetime Sparsity, a very good alternative to achieve sparse representations when training machines such as convolutional neural nets. EPLS is a meta-parameter free, off-the-shelf, simple and fast unsupervised feature learning algorithm, which exploits a new way of optimizing for sparsity. The algorithm sets an output target with one "hot code" while ensuring a strong form of lifetime sparsity to avoid dead outputs and optimizes for that specific target to learn the dictionary bases.