ESA title

A Machine Learning glossary

Machine Learning glossary: some false friends in Earth Observation and Computer Vision

Quick introduction

If you are an expert in Earth Observation (EO) and you would like to apply the latest methodologies of Machine Learning like Deep Convolutional Neural Network to carry-out EO analysis in a data driven approach, or if you are an expert in Artificial Intelligence and you would like to apply your knowledge in Computer Vision in a Remote Sensing domain, you might want to make sure that you have the correct meaning of some words in mind when you migrate from one domain to another.

Beware of false friends
#1 CLASSIFICATION

One of the first differences in wording I have noticed is related to the word classification: in Computer Vision the word classification includes the type of Supervised Learning problems which has classes (e.g. good, bad) as output type. This includes different types of sub-categories:

  • Image Classification: the model predicts if a certain class (e.g. cat, dog etc) is present in the picture
  • Object Detection: the model predicts if a certain class (e.g. cat, dog etc) is present in the picture and where it is located (e.g. Bounding box and location)
  • Image Segmentation: the model predicts for each pixel in the image the class that it belongs to
Figure1 Top left: example of Input image; top right: output of a classification; bottom left: output of an Object detection; Bottom right: output of image segmentation.

In Earth Observation the term classification includes a set of different products like Land cover or  Land Use. An example of this is the Corine Land cover in Figure 2.

Figure 2: Example of Land cover classification

As you can see those types of EO products are the equivalent of an image segmentation in the computer vision domain but the wording in EO comes from the objective of map which is actually to classify –in this case- the different type of Land Cover.

#2 VALIDATION DATASET

Another element of misalignment which I have noticed sometimes between EO experts and Computer Vision experts is related to the usage of the term validation datasets.

In Computer Vision or more generically in Computer Science, the standard approach in Supervised Machine Learning is to split the dataset in Training, Validation and Test sets. Where:

  • Training Dataset: The sample of data used to fit the model.
  • Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.
  • Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

On top of this, once the model is transferred to a production/operational chain, it is important to evaluate its performance also there.

In my experience, the meaning that the Earth Observation domain associates to the term “validation” is inherited more from the software engineering field where the validation process aims to verify that the system is doing the “right thing” and not just the “things right”.

The validation process includes also the data collection and data evaluation, from the process design stage and throughout production, which establishes scientific evidence that a process is capable of consistently delivering quality products. That’s why also in the Earth Observation domain you can hear about validation campaigns which aim to verify that a sensor in space is working as expected and it is generating data consistent with the model.

So you can imagine what kind of misunderstandings can be generated in a discussion where an EO expert is asking a Machine Learning expert: “do you have a validation dataset?”.

#3 TRAINING DATA SET

Last but not least, I have noticed in several conversations an improper usage of the wording “training dataset” in the EO domain. Sometimes EO experts refer to ground truth data, information collected on the Earth in a certain location (e.g. a national forestry inventory),  as training dataset, while in the Machine Learning domain, training dataset is intended to mean the joint input / expected output, one to one association. This has on several occasions created a non-trivial misunderstanding/ambiguity because one thing is to start a Machine Learning project with/while in possession of a training dataset, and another is to start a project where you have to build it.

I would also like to raise awareness –to the AI experts- of the high probability of misalignment between ground truth data and remote acquisitions which should not be underestimated when planning to build a dataset. To give you the simplest example, imagine that you have a map (a ground truth map) of an area after a fire and that you want to use this to build a training dataset with EO data. Your aim will be to train a model which will then be able to automate (in computer vision I would have to say “predict” but this might be misleading for an EO expert here J ) the mapping of burned areas but, e.g.  if you use an optical sensor, you might have cloud in that area.

This, the association of data to value (input/ expected output), required an enormous effort in the generation of the ImageNet dataset which was one of the enablers of the AI revolution we are witnessing in these years. On the other side the objective of remote sensing goes beyond simple image classification or segmentation or object detection (think about snow-water equivalent or biomass estimation) which require a level of expertise that cannot be delegated to the crowd -as we generally say, It’s not a cat/dog problem-. Because of this, I personally believe that the EO domain offers to the AI community great challenges and I am looking forward to seeing how this cooperation will grow in the future.

Conclusion

I only have one recommendation in this particular case which is applicable to any communication context but much more when different domains meet each other: “Do not take anything for granted!”, ask , make sure there is a common understanding and share your knowledge!

Post contributed by Alessandro Marin.

Φ-lab workshop on Sentinel-1 SAR data

Last week, ESA’s Φ-lab welcomed partners from the World Wildlife Fund (WWF) and the Food and Agriculture Organization of the United Nations (UN-FAO) for a collaborative workshop elaborating the potential use of Sentinel-1 SAR data for some of their projects. Both organisations traditionally work a lot over tropical regions where cloud coverage hinders the regular mapping of the environment with optical datasets.

It is well known that SAR sensors, thanks to their active SAR antennas, can acquire data independent of cloud cover and daytime. On the other hand, the underlying physical principles are fundamentally different to optical sensors such as Sentinel-2 or Landsat. This confronts new users with difficulties regarding proper image processing and interpretation, as well as an adequate use of the data for various tasks related to environmental monitoring.

The workshop was therefore targeted to de-mystify SAR data by giving practical examples of Sentinel-1 processing workflows and adapt them to specific problem statements such as the mapping of de-forestation and mangrove forests, the identification of water holes as well as the large-scale mapping of crop types.

RGB Sentinel-1 Timescan composite over the northeast of Borneo island in Malaysia. The green area in the center is the Tabin Wildlife Resort
WWF, ESA and FAO participants

A special focus was put on the innovative use of the free and openly available data of Sentinel-1, whose radar eyes cover the entire earth with a minimum of 12 days repeat, and produce around 10 TB of raw data every day. While the availability of this amount of data allows for completely new ways of extracting ever more detailed information on large scales, it also confronts the users with issues regarding the data handling and information extraction. Both partner organisations are mainly using Google’s online platform Earth Engine to tackle this issue, and respective processing strategies have been presented there. In addition, the SNAP based Open SAR Toolkit has been introduced, which allows for an almost fully automatic production of large-scale, analysis-ready SAR imagery and provide a more customisable way of processing for non-SAR experts. Its incorporation of Jupyter Notebooks allows for an eased usage on remote machines in cloud environments such as the Copernicus Data Information and Access services (DIASes) or FAO’s Sepal platform.

Finally, techniques of how to ingest the analysis-ready SAR data into machine-learning and AI frameworks have been discussed. While those techniques have been around for a while, they gain more and more importance with the constantly growing amount of satellite data. On this part, future collaborations between phi-lab and both organisations are foreseen in order to support an effective use of Copernicus data to help WWF and FAO in achieving important goals such as wildlife conservation and a world without hunger. 

Post contributed by Andreas Vollrath.

Workshop on AI4EO NEURIPS

The FDL (Frontier Development Lab) Europe presented the work done at the Neural Information Processing Systems (NeurIPS) annual conference. The purpose of the conference is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical and theoretical aspects. The core focus is peer-reviewed novel research which is presented and discussed in the general session, along with invited talks by leaders in their field.  FDL Researcher Valentina Zantedeschi presented Cumulo, a breakthrough dataset and method of fusing radar and image data for improved cloud classification. This was also awarded the best paper in the Climate Change research workshop, which is a fantastic achievement. Josh Veitch-Michaelis, FDL Researcher for the Disaster Response team, presented the Flood Detection on low cost orbital Software at the AI & HADR (Artificial Intelligence for Humanitarian Assistance and Disaster Response) Workshop. The mission support challenge team were also accepted to present their work at the Machine Learning competitions for all workshops.

The FDL (Frontier Development Lab) Europe participants presenting the work done at the NeurIPS annual conference. Credits: FDL Europe.

Workshop: Machine Learning in archaeology

An international conference and workshop on 7-8 November 2019 in Rome, Italy, organised by the European Space Agency (ESA) and the British School at Rome (BSR).

Artificial Intelligence, Machine Learning and Deep Learning are opening new frontiers of inquiry. Join the BSR and ESA in exploring applications of Machine Learning in artifact analysis, text mining and remote sensing. Papers will be presented at the BSR on November 7, followed by a workshop at ESA’s European Space Research Institute (ESA/ESRIN) on November 8

Find the full programme here.

Organised by Peter B. Campbell (BSR), Christopher Stewart (ESA), and Iris Kramer (Southampton)

Φ-news: ML for Earth Observation bootcamp

Φ-lab hosted an internal bootcamp on Machine Learning (ML) for Earth Observation during the first week of January, which provided the team with further hands-on experience on machine learning applicability to remote sensing using Matlab. The bootcamp was led by Prof. David Lary who also led the teem to look forward to future applications of ML such as swarm intelligence.

Machine Learning and Holistic Earth Observation

Machine learning has found many applications in remote sensing. These applications range from retrieval algorithms to bias correction, from code acceleration to detection of disease in crops, from classification of pelagic habitats, to rock type classification. As a broad subfield of artificial intelligence, machine learning is concerned with algorithms and techniques that allow computers to ‘learn by example. The major focus of machine learning is to extract information from data automatically by computational and statistical methods. Over the last decade there has been considerable progress in developing a machine learning methodology for a variety of Earth Science applications involving trace gases, retrievals, aerosol products, land surface products, vegetation indices, and most recently, ocean applications.

Prof. David Lary during the bootcamp organised at the Φ-Lab
Prof. David Lary during the bootcamp organised at the Φ-Lab

Φ-lab visiting fellowships from Airbus

ɸ-lab will welcome three visiting fellows from Airbus, joining a team working focused on the edge of disruptive technologies that monitor Earth’s environment from space. This is in line with one of ESA’s ɸ-lab goal: to help space researchers and companies to adopt these disruptive new technologies and methods.

Find more about this new agreement signed by Josef Aschbacher, Head of ESA’s Earth Observation Programmes, and Evert Dudok, Executive Vice President of Communications  Intelligence and Security for Airbus here.

ɸ-Lab fellowships

There are several ways and schemes through which one can work with ɸ-lab. Find more about research and visiting fellowships here.

SentinelSwipe: Advancing Earth Observation with every swipe and tap

The Φ-lab was part of a team that developed a citizen science-based application to classify ESA’s Sentinel data for machine learning in the Copernicus Space App Camp 2018 (check the Space App Camp video here!).  The mobile app, SentinelSwipe, was developed in the 1 week long Space App Camp, which brings together developers to create innovative apps that make Copernicus EO data available to a wide range of citizens and applications.

As part of a diverse team of programmers, citizen science and remote sensing experts, SentinelSwipe was developed to distribute the difficult task of providing labelled training data to the crowd, thus contributing towards advancing Earth Observation research by providing batches of classified satellite images. By using SentinelSwipe to classify and tag satellite imagery obtained from the Sentinel missions, the needed seed data to input into machine learning systems is provided to make predictions that can automate satellite image processing. 

The app repository of SentinelSwipe is hosted on our Φ-lab GitHub page.

Post contributed by Jennifer Adams.

Archaeology, Machine Learning and the power of crowdsourcing

Archaeological Prospection using Crowdsourcing and Machine Learning

Research is underway on the combined use of crowdsourcing and machine learning with EO data to systematically detect buried archaeological structures. Crowdsourcing is initially used to create labelled data, taking advantage of human interpretation to identify, often very faint, archaeological crop patterns in remotely sensed imagery. This labelled data will then be used to train a convolutional neural network to systematically identify similar buried structures over a larger area.

Examples of Archaeological Residues in EO Data (click to enlarge the figure)

As the population of the Earth increases, so does the demand for resources. Development puts at risk the irreplaceable archaeological record. Especially in countries with a rich archaeological heritage, such as in the Mediterranean region, methodologies are sorely needed to increase the efficiency and reduce the costs of archaeological survey.

Crop Marks and Soil Marks (click to enlarge the figure)

Participate in the crowdsourcing project to help identify buried archaeological features!

Post contributed by Chris Stewart

(Big) Data fusion

Copernicus programme, the largest single  Earth Observation program to date is delivering an unprecedented wealth of data imagery. In fact, during the ESA Earth Observation Φ-week, the director of ESA Earth Observation programmes Josef Aschabacher, announced that Sentinel missions are delivering 150 TB of satellite data per day! Cloud computing platforms and Artificial Intelligence (AI) are overtaking traditional tools to tackle challenges coming from Big Data and extraction of meaningful information taking advantage of high revisit time provided by these missions.

Data capture from Sentinel-1 (left) and Sentinel-2 (right) over summer 2018 during a three-month period (July-September). Number of passages increase with higher latitudes due to overlapping swaths.

Research is underway to identify  Machine Learning (ML) techniques for classification purposes fusing satellite data of different nature, namely synthetic aperture radar (SAR) and multispectral data from Sentinel-1 and Sentinel-2, respectively, from Copernicus program. Google Earth Engine provides many ML algorithms and satellite imagery data, being the method, which are very useful for extracting land cover from different sources of imagery. One of the main goals is to understand and quantify/qualify the improvement and accuracy of the best performing model (given by its overall accuracy) considering different bands and indices scenarios for multi-temporal land cover classification.

This will allow to create in a cost-effective way large-scale maps of land-cover at high resolutions, taking advantage of completely different type of sensors and higher revisit time from merging data from these constellations.

Post contributed by Sara Aparicio.

Vacancies: Join Φ-lab

Do you want a career at the cutting edge of new space technologies? Have you expertise in artificial intelligence, small-sats, high altitude pseudo-satellites, computer science, distributed ledgers, quantum computing or the new space economy? Do you want to use your knowledge to make new earth observation and space systems? Do you want to work in a fast-paced environment where teamwork makes your ideas a reality in 2 or 3 months? Then apply today for current open positions at ESA’s Φ-lab (Frascati, Italy).

Current open positions

These posts are only open to nationals of ESA Member States or Cooperating States. The posts are in the Directorate of Earth Observation Programmes. ESA is an equal opportunity employer, committed to achieving diversity within the workforce and creating an inclusive working environment. Applications from women are encouraged, and priority is given to applicants from under-represented Member States. If you require support with your application due to a disability, please email contact.human.resources@esa.int. 

Φ-lab Background

The Φ-lab’s mission is to accelerate the future of Earth Observation (EO), by helping Europe’s earth observation and space researchers and companies adopt disruptive technologies and methods. Right now, we’re working with Artificial Ingelligence (AI) on data from the Copernicus programmeEarth Explorer missions and cubesat satellites, drone and hyperspectral payload data and VR. HAPs, quantum technologies and distributed ledgers are on our to-do list. Φ-lab is part of the ESA Earth Observation Programme’s Φ-Department developing future systems for earth observation. Φ-lab also hosts ESA’s InCubed programme, providing rapid funding of innovative public private partnerships to exploit new EO markets. Φ-lab also convenes experts from across the World to develop research agendas on the relevance for EO of emerging technology topics including AI, distributed ledgers and quantum computing.

Φ-lab tests concepts in 1-3 month Case Studies executed by a multidisciplinary research team that you’d be a part of. Find more about the on going Case Studies.