Cloud masking of multitemporal remote sensing images

This notebook shows an example of the implementation of the method for multitemporal cloud masking and estimation of a cloud-free image proposed in Gomez-Chova et al. (2017) using the google earth engine (GEE) platform and illustrated for the Landsat-8 image collection. The Google's cloud infrastructure contains a multi-petabyte catalog of satellite imagery from well-known satellites which include Landsat, Sentinels and MODIS among others. In addition, it provides an Application Programming Interface (API) in JavaScript and Python to access, operate and visualize such data into an easy and scalable manner. 

Therefore, the convenience of the platform is twofold: (1) it permits to access to the a vast satellite imagery without downloading it using the same syntax regardless of the satellite, and (2) the computational burden of training the model and prediction is done on the GEE parallel distributed platform, which automatically performs an efficient memory management and distributes the processing power on virtual machines avoiding concurrency problems, which commonly arise in the context of high data volume processing.

Summary of the cloud detection and estimation module implemented in GEE:

We have used the Python API to implement the linear and kernel ridge regression models in GEE. For each Landsat image to process, the module first builds a cloud-free dataset by searching among the Landsat image collection for a particular location and given period of time. The cloud-free images from the time series are automatically selected using the cloud probability calculated from the Landsat BQA band. Secondly, it fits the model and predicts the background cloud-free image. Then, it computes the difference between the acquired and predicted images and performs a clustering using the difference image to create the final cloud mask.

Therefore, for the illustrative example showed bellow, the input to our algorithm will be a Landsat 8 image together with a region of interest where we want to: (1) detect the clouds and (2) predict a cloud-free/background image. The algorithm is mainly divided in three parts:

  1. Automatic selection of previous cloud-free images for the region of interest.
  2. Fitting the model and predicting the background of the image.
  3. Clustering the differences between the real and the estimated images to create the cloud mask.

The GEE potential was tested by processing the years 2015 and 2016 for a region of interest using the same implementation showed bellow only for an image from the Landsat 8 catalogue. Results of the processed time series are showed here.

For any reference to this work, please cite the following paper: Gomez-Chova et al. (2017) bibtex.

Automatic selection of previous cloud-free images for the region of interest

Set up

First, we need to initialize the GEE and import some functions. Then, we need to select an image to process from the Landsat 8 catalogue and also define a polygon which encloses our region of interest.

import ee

from datetime import datetime
from IPython.display import Image, display

from ee_ipl_uv import multitemporal_cloud_masking
from ee_ipl_uv import download
from ee_ipl_uv import time_series_operations
from ee_ipl_uv import time_series_show
from ee_ipl_uv import predefined_cloud_algorithms
from ee_ipl_uv import converters


# Select image to remove clouds
image_index = "LC81990332015238LGN00"
image_predict_clouds = ee.Image('LANDSAT/LC8_L1T_TOA_FMASK/'+image_index)
bands_original = image_predict_clouds.bandNames().getInfo()

# Select region of interest
pol = [[-0.50262451171875,39.39269330108945],

region_of_interest = ee.Geometry.Polygon(pol)
# Show image
image_file_original = download.MaybeDownloadThumb(image_predict_clouds.clip(region_of_interest),
display(Image(image_file_original),Caption('Clipped landsat 8 TOA image '+
                                           datetime.utcfromtimestamp(image_predict_clouds.get("system:time_start").getInfo()/1000).strftime("%Y-%m-%d %H:%M:%S")+
                                           ' over region of interest'))
Figure 1. Clipped landsat 8 TOA image 2015-08-26 10:43:24 over region of interest

Automatic selection of previous cloud-free images

For each image of interest, we need to select a number (e.g. n=3) of past cloud-free images from the time series in order to build the model. The image of interest is $X_t \in \mathbb{R}(p \times B)$, where $p \equiv \text{number of pixels}$ and $B \equiv \text{number of bands}$ and the final input dataset is $X = [X_t,X_{t-1},X_{t-2},X_{t-3}]$, $X \in \mathbb{R}(p \times 4 \cdot B)$, where we define $X_{t-j}$ being the $j$th previous image to $t$ without clouds.

Inspection of cloud cover from the previous images

We inspect the previous images using the PreviousImagesWithCC function, which provides the cloud cover over the region of interest computed from the Quality Assessment band (QABand) that is included within the Landsat image.

imcoll = multitemporal_cloud_masking.PreviousImagesWithCC(image_predict_clouds,region_of_interest,3,

def recortar(img):
    return img.clip(region_of_interest)

image_col = download.DownloadImageCollectionThumb(,

#concat all images
imagenes_previas = image_col.image_name
imagen = download.MosaicImageList(imagenes_previas,[3,4])

display(Image(imagen),Caption('Previous Landsat images (first 11 images) and image of interest (last image)'))
Downloading: 12 images: