LPS16 > Session details

Big Data

Back

2016-05-09 16:10 - 2016-05-09 17:50

Chairs: Marchetti, Pier Giorgio - Datcu, Mihai

Paper 948 - Session title: Big Data

16:10 Information Extraction from Continuous Flow of Sentinel Images Applied to Water Bodies Mapping and Monitoring

Yesou, Herve (1); Grizonnet, Manuel (2); Mercier, Grégoire (3); Pottier, Eric (4); Haouet, Sadri (1); Savinaud, Mickael (5); Giros, Alain (2); Maxant, Jerome (1); Studer, Mathias (1); Houper, Laurence (2); Faivre, Robin (1); Michel, Julien (2) 1: ICube-SERTIT, France; 2: CNES, DCP/SI/AP, France; 3: Telecom Bretagne, France; 4: IETR, France; 5: C-S, France

Show abstract
The arrival of Sentinel data marks a turning point in the Earth Observation community. The rapid and systematic dissemination of free images provided by the Sentinel system opens up important perspectives for operational monitoring territories from local and regional to global scales, with high update frequency. This will require adequate tools to handle and process this data flow, and if possible at the images stream flow rate.

In general, the design of a continuous flow of information extraction system for remote sensing image raises many questions from different perspectives: 1. Characterization of the needs of the intended application; 2. Adequacy of image stream envisaged to meet identified needs, 3. Methods of extracting information in these flows to meet the needs, 4. Design of a system capable to deal with data rates, 5. Relevance of the generated products in regards to the application.

In order to provide answers to these prospects, taking as thematic target water bodies, a four steps approach is proposed, consisting in:

1 - Definition of need "Monitoring of water bodies": water bodies are not so simple targets, which need to be precisely defined, they include large water bodies of thousands to tens of square kilometres, as well as smaller pound of few hectares. Part of these are not recurrent water surfaces, but yearly inundated during few days to few months. It is an important step as it will also allow to precise the final products, also in terms of space and time.

2 - Identification and implementation of information extraction: for each sensor’ type, i.e. SAR for Sentinel-1 and optical for Sentinel-2, identifying and evaluating bricks of image processing able to extract the targeted information. The already existing algorithms, available in open source software (ORFEO ToolBox, PolsarPRO, OpenCV, among others) are strongly privileged. However, approaches such as soft classifiers cascade, decision trees or random forests, should be also investigated. A particular attention is also bring over the identification of exogenous data/database to integrate in the chain, both for information extraction guidelines, and validation purpose sought by exploiting the synergy between the different target sensors, spectral characteristics and temporal revisit.

3 - The demonstrator development done thanks to a gradual integration of processing blocks. This chaining will be gradual in order to test and validate the built blocks, step by step. Validation will be done, sensor by sensor and then taking into account multi-temporal single-sensor aspect, and finally merging the optical and SAR approaches, etc. The thematic and technological performance will be evaluated point by point, which will validate, correct or modify deeper approaches initially selected.

4 - Validation: the validation approach (the need to products), will be performed over at least two well-known test sites (ORFEO / RTU frame, SWOT Aval, DRAGON ESA, etc), i.e. the Poyang Lake on the banks of the Yangtze in China, and the Alsatian flood plain in France. A third site will be selected to validate the generic approach and its generalization to less instrumented areas and less well-known phenomena.

At the end of the project, the development of an "ad hoc" and "on the flow" processing chain, in a scalable environment to allow the handling of large volumes of data, applied to water bodies mapping and monitoring would have been set up.

[Authors] [ Overview programme] [ Keywords]
Paper 1710 - Session title: Big Data

17:30 Enhanced traceability for bulk processing of Sentinel-derived information products

Lankester, Thomas Henry Gervase; Hubbard, Steven Robert; Knowelden, Richard Anthony Airbus Defence and Space, United Kingdom

Show abstract
The advent of widely available, systematically acquired and advanced Earth observations from the Sentinel platforms is spurring development of a wide range of derived information products. Whilst welcome, this rapid rate of development inevitably leads to some processing instability as algorithms and production steps are required to evolve accordingly. To mitigate this instability, the provenance of EO-derived information products needs to be traceable and transparent. A user must be able to determine if products of the same type are identical or subtly different and a producer should be able to (re)create products using earlier processing chain configurations. This is particularly pertinent to time-series of bio-geophysical products since a change to any step in the original processing chain can invalidate the consistency and reliability of the time-series, introducing spurious discontinuities or trends.

Airbus Defence and Space (Airbus DS) has developed the Airbus Processing Cloud (APC) as a virtualised processing farm for bulk production of EO-derived data and information products, including time-series of bio-geophysical products. Utilising the UK-Collaborative Ground Segment for Sentinel data access, the APC realises the Big Data concept of bringing processing to the data. The virtualised processing farm is managed via vSphere, the execution of individual processing steps is handled through the use of Celery-based workers and task queues, with the overarching production orchestrated by an Azkaban workflow engine.

Each processing step involves execution of an instance of a software package with a set of process parameters. Each production chain involves the flow of data through a defined series of processing steps (with, or without, branching). A change to any processing step results in the creation of a new version of both the processing step and of the production chains that include that processing step. The configuration of the software version and processing parameters for each process step, and for the overall production chain, is recorded and controlled for the APC via a series of database tables. To ensure transparency, this stepwise internal configuration control of product generation needs to be converted to a form that allows users to view the full provenance of supplied products.

The INSPIRE Directive of the European Union calls for geospatial data products to be provided with standardised metadata. The INSPIRE implementing rules for metadata build on the ISO 19115 metadata standard which, for ortho-imagery, recommends the documentation of process steps and data sources as part of the lineage metadata element. Under ESA guidance, and in the context of the space component of the Copernicus programme, Airbus DS has taken these INSPIRE recommendations further by applying the ISO 19115-2 (2009) extensions for imagery and gridded data. These extensions provide the structures for documenting product processing levels, referencing the processing software used for each processing step and the run-time parameters applied.

The production control system of the APC applies these enhanced guidelines by transforming the internal database configuration control information into an INSPIRE XML metadata file containing a stepwise set of processing steps and data source elements that provide the complete and transparent provenance of each information product generated.

[Authors] [ Overview programme] [ Keywords]
Paper 2228 - Session title: Big Data

16:30 Big Data challenges indexing large-volume, heterogeneous EO datasets for effective data discovery

Waterfall, Alison; Donegan, Steve; Bennett, Victoria; Kershaw, Phil; Juckes, Martin STFC, United Kingdom

Show abstract
The Centre for Environmental Archival (CEDA) in the UK, delivers long term curation of scientifically important climate and Earth Observation data, and facilitates the use of data by the UK and European environmental science community. Well over 3 petabytes of data are now available from CEDA’s archives and this will rise considerably in the next 12 months as data arrives from the Sentinel satellites as well as climate modelling datasets from CMIP6 (the Coupled Model Intercomparison Project, Phase 6). Alongside the archive, CEDA provides extensive computing facilities, via the JASMIN “super data cluster” to meet some of the challenges of such large data volumes.

An important aspect of the work of CEDA is to ensure the dissemination of environmental datasets to a wide audience.   To this end CEDA is involved in a number of initiatives both nationally and internationally to enable optimum exploitation of the data held, both through providing computing infrastructure and software next to the data to enable data exploitation, and through providing better ways for data discovery , and ensuring that the data held within the CEDA archives conforms to community agreed standards for data format and metadata

Traditionally, discovery of CEDA datasets has been via our web interface to CEDA’s own ISO19115 compliant metadata catalogue and published via an OGC Catalogue Service (CSW). This enables datasets held by CEDA to be searched and discovered through portals such as the NERC Data Catalogue Service, the UK INSPIRE portal and ultimately, the GEOSS portal.   CEDA has also released a new service allowing users to find and access data at a file level from the archive based on an Elastic Search database. Population of this system requires extensive scanning of the archive, and this has been initially focussed on CEDA’s extensive archive of aircraft data; and is being evolved to incorporate Sentinel data within its Elastic Search database.

To address the challenges for distributing high-volume environmental data, an existing significant international effort is the Earth System Grid Federation (ESGF), which has traditionally been focussed on the climate community; and CEDA plays an active role in distributing CMIP-5 and CMIP-6 data within this.    This expertise is now being extended to Earth Observation datasets, through involvement in projects such as CLIP-C and the CCI Open Data portal.   The use of such a federated data portal opens up the data to a wider climate community, and allows efficient dissemination of very large datasets, but the inclusion of Earth Observation data brings new challenges due to the heterogeneous nature of satellite datasets.

The Earth System Grid Federation provides a faceted search of the data within it, i.e. users search for the data using pre-defined terms and controlled vocabularies.   The use of these controlled vocabularies allows better exploitation of the data, particularly for the development of standardised tools and non-human interactions.   This is a relatively constrained task for the climate modelling community; however, when faced with real satellite observations, which by nature may be of a custom variable unique to a given instrument, measurement technique and retrieval methods, the challenge of placing such data within such a controlled vocabulary is multiplied.   This presentation will illustrate this with examples from work being undertaken in the framework of the ESA CCI open data portal.

[Authors] [ Overview programme] [ Keywords]
Paper 2354 - Session title: Big Data

16:50 An operational radiometric multi-sensor pre-processing framework for large-area time-series applications of medium resolution optical satellite imagery

Frantz, David; Röder, Achim; Stellmes, Marion; Hill, Joachim Trier University, Germany

Show abstract
We developed a large area pre-processing framework for multi-sensor Landsat data, capable of processing large data volumes across space and through time. As being designed as multi-sensor processing strategy, imagery from the Thematic Mapper, Enhanced Thematic Mapper and Operational Land Imager sensors are processed with one single algorithm.

Cloud and cloud shadow detection is performed by a modified Fmask code. Surface reflectance is inferred from Tanré’s formulation of the radiative transfer, including adjacency effect correction. A pre-compiled MODIS water vapor database provides daily or climatological fallback estimates. Aerosol optical depth (AOD) is estimated over dark objects that are identified in a combined database-, image- and object-based approach, where information on their temporal persistency is utilized. AOD is inferred with consideration of the actual target reflectance, altitude and background contamination effect. In case of absent dark objects in bright scenes, a fallback approach with a modelled AOD climatology is used instead. Topographic normalization is performed by a modified C-correction. The data are projected into a single coordinate system and are organized in a gridded data structure for simplified pixel-based access tailored for the specific demand of time-series and pixel-based compositing applications.

We based the assessment of the produced dataset for our southern African study area on an exhaustive analysis of overlapping pixels: 98.8% of the redundant overlaps are in the range of the expected ±2.5% overall radiometric algorithm accuracy. AOD is invery good agreement with AERONET sunphotometer data (R²: 0.72 to 0.79, low intercepts of 0.02 to 0.03 and slopes near unity). The uncertainty in using the water vapor fallback climatology is approximately ±2.8% for the TM SWIR1 band in the wet season. The topographic correction was considered successful by an investigation of the non-relationship between the illumination angle and the corrected radiance.

In order to strengthen the multi-sensor capabilities of our framework, we intend to adapt the method for the coherent inclusion of similar upcoming optical sensors like any Landsat-9+ spacecraft or the soon-to-be-available Sentinel-2 A/B imagery. Elaborate follow-up applications that are in need of dense and long time series across large areas will greatly benefit from such an integrated pre-processing as the impact of cross-sensor variations can be significantly reduced if all data are processed with one single algorithm. In addition, the integrated usage of Landsat and Sentinel-2 data offers new opportunities in areas where dense coverage with optical imagery is still of concern and will greatly improve the applicability of certain time series approaches that are in need of dense input data.

[Authors] [ Overview programme] [ Keywords]
Paper 2548 - Session title: Big Data

17:10 Big Data Processing: Lessons learned from the Global Web Enabled Landsat Data (WELD) project

Votava, Petr (1,2); Roy, David (3) 1: University Corporation at Monterey Bay, United States of America; 2: NASA Ames Research Center, United States of America; 3: Geospatial Sciences Center of Excellence, South Dakota State University, United States of America

Show abstract
At over 40 years, the Landsat series of satellites provides the longest temporal record of space-based surface observations. The Sentinel-2 multi-spectral global coverage will continue this record but with increased data volume. The need for “higher-level” Landsat and Sentinel-2 products, i.e., beyond radiometricallyand geometrically corrected scenes, has been advocated by the science and applications user communities. The NASA funded Web-enabled Landsat Data (WELD) project has demonstrated this capability by systematically generating 30m weekly, seasonal, monthly and annual temporally composited Landsat mosaics of the conterminous United States (CONUS) and Alaska for 10+ years (http://weld.cr.usgs.gov/). Recently, the WELD code was ported to the NASA Earth Exchange (NEX) to generate global 30m monthly and annual products and native resolution browse images using every available contemporaneous Landsat 5 and 7 scene (http://globalweld.cr.usgs.gov/). The NEX is a collaborative supercomputing (access to over 200,000 cores) and data (access to 2PB online storage) platform that facilitates end-to-end execution of large-scale Earth science projects with distributed process execution, analysis and result sharing (https://nex.nasa.gov). This presentation describes the global WELD processing approach and challenges for data management, (re)processing, provenance tracking, and product quality assessment within the NEX production pipeline. The product distribution from the NASA USGS Land Processes Distributed Active Archive Center (LP DAAC) and the native resolution browse image integration into NASA’s internet Global Imagery Browse Service (GIBS) are shown. The latest version of the global WELD products and plans to expand the production toprovide Landsat 30m higher level products for any terrestrial non-Antarctic location for six 3-year epochs from 1985 to 2010 are presented. The NEX global WELD product and browse image production pipeline will process over 10PB of data using more than 1 million CPU hours. This presentation provides relevant insights into the Big Data processing challenges to advancing the virtual constellation paradigm for mid-resolution land imaging.

[Authors] [ Overview programme] [ Keywords]

Download Mobile App

Enjoy the Crowd Cover Classification Experiment