VistaLab:
The Matlab toolbox for linear spatio-temporal Vision Models
Jesús Malo & Juan Gutiérrez
jesus.malo@uv.es
(c) Universitat de València 1996 - 2018
Accurate control of texture and speed was crucial both to (a) test our first perceptually weighted optical flow algorithms [Electr. Lett. 00, IEEE Trans.Im.Proc. 01], and (b) generate stimuli for controlled motion aftereffects [ Front. Human Neurosci. 15].
VistaLab is a Matlab toolbox that provides the linear building-blocks to create spatio-temporal vision models and the tools to control the spatio-temporal properties of video sequences. These building-blocks include the spatio-temporal receptive fields of LGN, V1 and MT cells, and the spatial and spatio-temporal Contrast Sensitivity Functions. Additionally, VistaLab allows accurate spatio-temporal sampling, spatio-temporal Fourier domain visualization, and generation of video sequences of controlled texture and speed. Sequence generation includes noise and random dots sequences (with controlled spectrum and optical flow), and tools to synthesize animations of rigid bodies of Lambertian reflectance evolving in newtonian force fields. These sequences have controlled 3D motion and known 2D optical flow.
The perception tools and video synthesis tools in VistaLab enable accurate illustrations of the visibility of achromatic spatio-temporal patterns given the control of the energy, the frequency content (texture and speed), and the context in which they are seen. Rough approximations of the visibility can be computed from the linear filters included in VistaLab.
The standard tools in VistaLab (and in ColorLab) are the necessary building blocks to develop more sophisticated vision models included in the dedicated site VistaModels.
Two views of the 3D Fourier transform of the sequence at the left (the ft=0 plane is highlighted in blue). Note how the modification of texture and speed in the sequence implies different energy distributions in the Fourier domain. Accurate control of texture and speed of movies allows to saturate certain sensors and induce stronger aftereffcts. This is because the description of motion aftereffects [Front. Human Neurosci. 15] requires models of V1 sensors in the spatio-temporal Fourier domain with frequency-dependent cross-inhibition. .
Most of the Retinal Ganglion Cells and cells in the LGN can be modelled with center-surround receptive fields with monophasic or biphasic temporal response. VistaLab comes with a configurable implemenation of such receptive fields according to the general expressions in [Cai, Freeman, DeAngelis, J. Neurophysiol. 97]. Using these units it is easy to generate artificial retinas with arbitrary sampling [Martinez-Garcia et al. 16, Martinez-Garcia et al. 17].
The examples below show (a) the receptive field of a representative neuron in the spatiotemporal and in the 3D Fourier domain, and (b) the response of a population of such neurons to a natural movie assuming uniform retinal sampling and spatial invariance of the receptive field. VistaLab allows explicit implementation of each sensor response using the scalar product by the corresponding receptive field to get rid of the uniform sampling and the convolution assumptions.
Pulsating on-center LGN neuron: Receptive field (or impulse response) in the spatiotemporal domain, where white stands for excitation and black stands for inhibition.
Temporal variation of spatial response: Center-surround excitation at the stimulus onset is followed by reversed sign response when stimulation vanishes.
Frequency selectivity: Band pass of this kind of neurons in the spatio-temporal Fourier domain. This is the Fourier transform of the receptive field at the left.
See Vistalab CODE to develop these examples demo_cells.m
Response of some LGN population to a natural stimulus: Assuming a spatially invariant population of LGN cells like the one depicted above uniformly covering the visual field, we can compute the response to a natural movie using plain convolution or product in the Fourier domain.
Spectrum of the original sequence: each branch of the X represents an object moving with constant speed (i.e. comes from one of the waving hands)
Spectrum of the response: response was computed in the Fourier domain as the aplication of the band-pass filter function of this specific LGN population to the original spectrum.
Simple cells in the V1 cortex can be modelled with Gabor-like receptive fields tuned to certain spatial and temporal frequencies. VistaLab comes with a configurable implemenation of such receptive fields according to the general expressions in [Daugman JOSA A 89, Simoncelli & Heeger Vis. Res. 98]. Using these units it is easy to generate artificial cortex with arbitrary sampling [Martinez-Garcia et al. 17].
The examples below show six representative neurons tuned to the same spatial frequencies (7 cpd) but different temporal frequencies 2, 7, and 10 Hz, both positive and negative.
Eventhough there is no conclusive tuning to two-dimensional speed due to the aperture problem [Heeger JOSA 87], in the direction perpendicular to the grating, these are tuned to 0.3, 1 and 1.5 degrees/sec respectively (both positive and negative).
Figures show: (a) the receptive fields in the spatiotemporal and in the 3D Fourier domain, and (b) the response of a population of such neurons to a natural movie assuming uniform retinal sampling and spatial invariance of the receptive field. VistaLab allows explicit implementation of each sensor response using the scalar product by the corresponding receptive field to get rid of the uniform sampling and the convolution assumptions.
Pulsating V1 neurons: Receptive fields (or impulse response) in the spatiotemporal domain, where white stands for excitation and black stands for inhibition. In this figure there are 2*3 replications of a 1 degree visual field. Each replication shows the receptive field of a neuron tuned to the (same) central location but different spatio-temporal frequency.
See Vistalab CODE to develop these examples demo_cells.m
Temporal variation of spatial responses: In a surface of sensors of the corresponding kind, Gabor-like excitation/inhibition at the stimulus onset turns into travelling a wave that vanishes afterwards.
Frequency selectivity: Band pass of the six considered neurons. This is the Fourier transform of the receptive fields shown above.
Response of the six V1 populations to a natural stimulus: Assuming a spatially invariant populations of V1 cells like the ones depicted above uniformly covering the visual field, we can compute their responses to a natural movie using plain convolution or product in the Fourier domain. In the figure we show the original stimulus and the six corresponding sets of responses. Note how the different populations respond to qualitatively different features of the stimulus (hand going up/down, static objects of the right frequency), and note how the cells tuned to too high speed do not respond.
Cells in the MT cortex receive projections from V1 cells aligned in a plane in the spatio-temporal Fourier domain. Therefore, they are narrow-band in speed tuning. VistaLab comes with a configurable implemenation of such receptive fields according to the general expressions in [Simoncelli & Heeger Vis. Res. 98]. Using these units and a spatio-temporal window it is easy to generate artificial MT cortex with arbitrary sampling [Martinez-Garcia et al. 17].
The examples below shows six representative sets of neurons tuned to tuned to speeds of 0.3, 1 and 1.5 degrees/sec respectively (both positive and negative).
In this case while
Figures show: (a) the receptive fields in 3D Fourier domain, the kind of features these cells are optimally tuned to, and (b) the response of a population of such neurons to a natural movie assuming uniform retinal sampling and spatial invariance of the receptive field. VistaLab allows explicit implementation of each sensor response using the scalar product by the corresponding receptive field to get rid of the uniform sampling and the convolution assumptions.
Frequency selectivity in MT: Band pass of the six considered kinds of neurons. These are the sum of narrow-band V1-like filters shown above. Only filters aligned according to well-defined speed planes are combined in each MT cell.
Optimal patterns for MT neurons: These patterns were computed by injecting noise only in the band where the different cells are sensitive. Each replication shows the corresponding pattern (showing no specific spatial frequency content) but a markedly different speed.
See Vistalab CODE to develop these examples demo_cells.m
Temporal variation of spatial responses: In surfaces of sensors of the considered kinds, the patterns at the left lead to noisy travelling waves.
Response of the six MT populations to a natural stimulus: Assuming a spatially invariant populations of MT cells like the ones depicted above uniformly covering the visual field, we can compute their responses to a natural movie using plain convolution or product in the Fourier domain. In the figure we show the original stimulus and the six corresponding sets of responses (where white means excitation, black means inhibition, and gray means spontaneous/basal response). Note how the different populations respond to qualitatively different features of the stimulus (hand going up/down, static objects of the right frequency), and note how the cells tuned to too high speed do not respond.
VistaLab comes with different Contrast Sensitivity Functions (CSFs): (a) the spatial-achromatic CSF from the OSA Standard Spatial Observer [Watson & Malo IEEE ICIP 02], (b) the spatial-chromatic, Red-Green and Yellow-Blue CSFs of K. Mullen [Vis. Res. 85], with approrpiate scaling [Gutierrez et al. 12], and (c) the achromatic spatio-temporal CSFs of D. Kelly [JOSA 79], and S. Daly (with object tracking speed compensation) [SPIE 98].
Spatial-only Achromatic and Chromatic CSFs: The achromatic CSF of the Standard Spatial Observer includes the oblique effect. The chromatic CSFs are assumed to be isotropic. | |
|
Spatial-only CSFs on a natural image: VistaLab applied together with COLORLAB allows accurate application of the spatial CSFs. In this example the natural image was first represented in CIE XYZ values. Then, it was transformed to a sensible linear Y, RG, YB representation [Ingling&Tsou, Vis. Res. 79], and then, each CSF was applied to the corresponding chromatic channel. The result was expressed in digital values to be shown in regular diaplays. | |
|
Spatio-temporal CSF (with saccade compensation):in the motion-compensated case sensitivity to stationary patterns decreases a lot (note the reduction of the CSF at ft=0). Below you can see three different representations of this function. | |
|
Natural movie filtered by spatio-temporal CSF. | |
![]() |
![]() |
Speed reversal in the Fourier domain: In the periodic pattern sequences all the sinusoidal components were computed to travel to the right with the same speed. However, in the last (high speed) case, the high frequency pattern seems to go backwards. Aliasing suffered by this component in this case is easy to understand in the Fourier domain. According to the optical flow equation in the Fourier domain [Watson & Ahumada, JOSA A 85], when the speed increases (in our case from left to right) so it does the inclination of all the sinusoidal components of the sequence in the fx, ft plane. For low speeds all the components are aligned. However, when the speed is too high, the high frequency components have temporal frequencies over the Nyquist limit. This implies an apparent inclination for those frequencies which is totally oposed to the rest of the components, and hence, speed reversal. NOTE: the spatial extent of the discrete domain is 2 deg, and the frame rate is 20 Hz, the spatial frequency of the high frequency component is 10 cpd, and the actual speed in the last case should be 1.5 deg/sec. |
VistaLab only addresses the linear part of the neural mechanisms that mediate the preattentive perception of spatio-temporal patterns. However, it doesnt combine these mechanisms to compute motion (optical flow), it doesnt include the nonlinear interactions between the linear mechanisms, and it doesnt include color.
These issues can be addressed with other toolboxes, namely VistaVideoCoding, BioMultiLayer_L_NL_color in VistaModels, and Colorlab.
Optical flow in VistaVideoCoding: this separate toolbox provides different motion estimation algorithms for MPEG-like video coders. We proposed improvements to multiscale flows based on perceptual criteria to increase the resolution of the motion estimate [Malo98, Malo00a, Malo00b, Malo01a, Malo01b]. |