Image and Video Processing:

Scene Statistics and Visual Neuroscience at work!


Jesus

Controversies: don't use (1) the Mean Squared Error, nor (2) the image of Lena Söderberg. Here we only talk about the MSE issue.




MPEG-like video compression is based on predictive coding: i.e. motion compensation and residual quantization. Motion compensation consist of optical flow estimation (M module) and motion-based prediction (P module). Residuals (prediction errors) are analyzed in a different domain (T module) and quantized therein (Q module). The different modules in this framework will continue to benefit from image representations coming from Vision Science and prediction and regression based on Statistical Learning.

Efficient coding of visual information and efficient inference of missing information in images depend on two factors: (1) the statistical structure of photographic images, and (2) the nature of the observer that will analyze the result. Interestingly, these two factors (image regularities and human vision) are deeply related since the evolution of biological sensors seems to be guided by statistical learning (see our work on the Efficient Coding Hypothesis in Visual Neuroscience). However, the simultaneous consideration of these two factors is unusual in the image processing community, particularly beyond Gaussian image models and linear models of the observer.
Our work in image and video processing has been parallel to our investigation in describing the non-Gaussian nature of visual scenes and the nonlinear behavior of visual cortex. This parallel approach is sensible since these are two sides of the same issue in vision (the Efficient Coding Hypothesis again!). Specifically, the core algorithm used in many applications has been the Divisive Normalization, a canonical computation in sensory neurons with interesting statistical effects (see [Neur.Comp.10]). We have used this perceptual (and also statistical) model to propose novel solutions in bit allocation, to identify perceptually relevant motion, to smooth image representations and to compute distances between images.



















Decoded sequences for various settings of Motion Estimation (M module) and Residual Quantization (T+Q modules). See for example Electr.Lett.00a, or IEEE TIP01.



Image and Video Processing

Low level Image Processing (coding, restoration, synthesis, white balance, color and texture edition, etc...) is all about image statistics in a domain where the metric is non-Euclidean (i.e. induced by the data or the observer).

We proposed original image processing techniques using both perception models and image statistics including (i) improvements of JPEG standard for image coding through nonlinear texture vision models [Electr.Lett.95, Electr.Lett.99, IEEE TNN05, IEEE TIP06a, JMLR08, RPSP12, Patent08], (ii) improvements of MPEG standard for video coding with new perceptual quantization scheme and new motion estimation focused on perceptually relevant optical flow [LNCS97, Electr.Lett.98, Electr.Lett.00a, Electr.Lett.00b, IEEE TIP01, Redund.Reduct.99], (iii) new image restoration techniques based on nonlinear contrast perception models and the image statistics in local frequency domains [IEEE TIP 06b, JMLR10]; (iv) new approaches to color constancy either based on relative chromatic descriptors [Vis.Res.97, J.Opt.96], statistically-based chromatic adaptation models [Neur.Comp.12, PLoS-ONE14], or Bayesian estimation of surface reflectance [IEEE-TGRS14]; (v) new subjective image and video distortion measures using nonlinear perception models [Im.Vis.Comp.97, Disp.99, IEEE ICIP02, JOSA10, Proc.SPIE15]; and (vi) image classification and knowledge extraction (or regression) based on our feature extraction techniques [IEEE-TNN11, IEEE-TGRS13,Int.J.Neur.Syst.14, IEEE-JSTSP15]. See CODE for image and video processing applications here.

Image Coding: using nonlinear perceptual image representations is critical to improve JPEG (see the gain in visual quality at 1 bit/pix). Measuring subjective distortion (the numbers at the bottom) is another vision-related problem we addressed (see below).



Image Restoration: regularization functionals based on nonlinear perception models and signal smoothing according to image statistics in the wavelet domain help in image restoration.



Subjective image/video distortion metrics: Observer's opinion (ground truth in the vertical axis) is better correlated with our Euclidean distance in nonlinear perceptual domains (right) than with the widely used Structural Similarity Index (left).

Video Coding:  improved bit allocation according to nonlinear perception model (right vs left) is critical to improve MPEG video coding with regard to improved optical flow computation (bottom vs top).

Color Constancy and White Balance:  these adaptation problems reduce to manifold matching in different illumination conditions. We proposed linear and nonlinear solutions to this geometric problem.

Image classification:  Classifiers based on flexible features adapted to the data (such as RBIG, SPCA, PPA, DRR) are robust to changes in acquisition conditions (adaptivity implies no retraining is needed).


References