Kernel dependence measures yield accurate estimates of nonlinear relations between random variables, and they are also endorsed with solid theoretical propertie and convergence rates. Besides, the empirical estimates are easy to compute in closed form just involving linear algebraoperations. However, they are hampered by two important problems:the high computational cost involved, as two kernel matrices of the sam-ple size have to be computed and stored, and the interpretability of themeasure, which remains hidden behind the implicit feature map. We hereaddress these two issues. We introduce the Sensitivity Maps (SMs) for theHilbert-Schmidt independence criterion (HSIC). Sensitivity maps allow usto explicitly analyze and visualize the relative relevance of both examplesand features on the dependence measure. We also present the randomizedHSIC (RHSIC) and its corresponding sensitivity maps to cope with largescale problems. We build upon the framework of random features andthe Bochner's theorem to approximate the involved kernels in the canon-ical HSIC. The power of the RHSIC measure scales favourably with thenumber of samples, and it approximates HSIC and the sensitivity mapsefficiently. Convergence bounds of both the measure and the sensitivitymap are also provided. Our proposal is illustrated in synthetic examples,and challenging real problems of dependence estimation, feature selection,and causal inference from empirical data.

New social and economic activities massively exploit big data and machine learning algorithms to do inference on people's lives. Applications include automatic curricula evaluation, wage determination, and risk assessment for credits and loans. Recently, many governments and institutions have raised concerns about the lack of fairness, equity and ethics in machine learning to treat these problems. It has been shown that not including sensitive features that bias fairness, such as gender or race, is not enough to mitigate the discrimination when other related features are included. Instead, including fairness in the objective function has been shown to be more efficient. We present novel fair regression and dimensionality reduction methods built on a previously proposed fair classification framework. Both methods rely on using the Hilbert Schmidt independence criterion as the fairness term. Unlike previous approaches, this allows us to simplify the problem and to use multiple sensitive variables simultaneously. Replacing the linear formulation by kernel functions allows the methods to deal with nonlinear problems. For both linear and nonlinear formulations the solution reduces to solving simple matrix inversions or generalized eigenvalue problems. This simplifies the evaluation of the solutions for different trade-off values between the predictive error and fairness terms. We illustrate the usefulness of the proposed methods in toy examples, and evaluate their performance on real world experiments of predicting income using gender and/or race discrimination as sensitive variables.

We introduce warped Gaussian process (WGP) regression in remote sensing applications. WGP models output observations as a parametric nonlinear transformation of a GP. The parameters of such a prior model are then learned via standard maximum likelihood. We show the good performance of the proposed model for the estimation of oceanic chlorophyll content from multispectral data, vegetation parameters (chlorophyll, leaf area index, and fractional vegetation cover) from hyperspectral data, and in the detection of the causal direction in a collection of 28 bivariate geoscience and remote sensing causal problems. The model consistently performs better than the standard GP and the more advanced heteroscedastic GP model, both in terms of accuracy and more sensible confidence intervals.

Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in such systems present a number of major challenges for causal discovery techniques and it is largely unknown which methods perform best for which challenge. The CauseMe platform provides ground truth benchmark datasets featuring different real data challenges to assess and compare the performance of causal discovery methods. The available benchmark datasets are either generated from synthetic models mimicking real challenges, or are real world data sets where the causal structure is known with high confidence. The datasets vary in dimensionality, complexity and sophistication.

Among the many existing approaches, Granger causality (GC) has been perhaps the most widely used approach. Noting its strong linearity and stationarity assumptions, several nonlinear alternatives to GC have been proposed based on kernel methods. Using a family of cross-information kernel functions that account for nonlinear cross-relations between the involved variables we generalize the non-linear relations. Adding new kernel function to the cross-information we can addresses the issue of dealing with non-stationary processes.

Independence and conditional independence tests are a feature of many causal discovery methods. Constraint based methods such as the PC algorithm rely on conditional independence tests while asymmetry based methods such as Additive Noise Models rely on (unconditional) independence tests. Kernel based measures such as the Hilbert-Schmidt independence criterion (HSIC) or the Kernel-based Conditional Independence test (KCI) provide a way of quantifying (possibly non-linear) dependencies. However, the robustness of these measures to kernel parameters is not always clear. Since independence testing is fundamentally an usupervised learning problem, heuristics such as the median distance for the rbf kernel lengthscale are generally prescribed in the literature. In this shiny app we explore the robustness of HSIC based independence tests to kernel parameters and to data generated according to different mechanisms (linear/non-linear, additive/non-additive, noisy/deterministic, etc).

Granger causality approaches have problems in nonstationary and nonlinear scenarios, but also when the relation among variables is thought to be deterministic, especially in dynamic systems with weak to moderate coupling. This is because separability is not satisfied in such systems. Actually, in deterministic dynamic systems (even noisy ones), if X is a cause for Y, information about X will be redundantly present in $Y$ itself and cannot formally be removed from the causative variables a consequence of Takens' theorem. The convergent cross-mapping (CCM) algorithm is a nonlinear state-space method which reconstructs the variable's state spaces (M_x, M_y) using time embeddings, and concludes on X->Y if points on ${\bf M}_x$ can be predicted using nearest neighbors in M_y more accurately as more points are used. However, CCM is very sensitive to noise and time series length. We propose several techniques of bootstrap resampling, Markov chain Monte Carlo (MCMC) and adaptive importance sampling (AIS) to alleviate the problem.