LPS16 > Session details

Data Intensive Science

Back

2016-05-10 13:10 - 2016-05-10 14:50

Chairs: Blower, Jon - Thankappan, Medhavy

Paper 569 - Session title: Data Intensive Science

13:10 Evolution of the Australian Geoscience Data Cube

Oliver, Simon (1); Wu, Wenjun (1); Ip, Alex (1); Woodcock, Robert (2); Wang, Peter (2); Paget, Matt (2); Evans, Ben (3); Lewis, Adam (1); Dekker, Arnold (2); Thankappan, Medhavy (1); Held, Alex (2); Sixsmith, Joshua (1) 1: Geoscience Australia; 2: Commonwealth Scientific Industrial Research Organisation; 3: National Computational Infrastructure

Show abstract
The Australian Geoscience Data Cube (AGDC) Programme envisions a Digital Earth, composed of observations of the Earth’s oceans, surface and subsurface taken through space and time stored in a high performance computing environment. The AGDC will allow governments, scientists and the public to monitor, analyse, and project the state of the Earth. The AGDC will also realise the full value of large Earth observation datasets by allowing rapid and repeatable continental-scale analyses of Earth properties through time and space.

At its core, the AGDC is an indexing system which supports parallel processing on HPC. One of the key features of the AGDC approach is that all of the observations (pixels) in the input data are retained for analysis; the data are not mosaicked, binned, or filtered in any way and the source data for any pixel can be traced through the metadata. The AGDC provides a common analytical platform on which researchers can complete complex full depth analyses of the processed archive (~500TB) in a matter of hours. As with the European Space Agency’s (ESA) GRID Processing on Demand (GPOD) system (https://gpod.eo.esa.int ), the AGDC will allow analyses to be performed on a data store. By arranging EO data spatially and temporally, the AGDC enables efficient large-scale analysis using a “dice and stack” method which sub-divides the data into spatially regular, time-stamped, band -aggregated tiles that can be traversed as a dense temporal stack.

The AGDC application programming interface (API) allows users to develop custom processing tasks. The API provides access to the tiles by abstracting the low level data access. Users don’t need to be aware of the underlying system and data specific interactions to be able to formulate and execute processing tasks. The development of precision correction methodologies to enable production of comparable observations (spatially and spectrally), as well as the attribution of quality information about the contents of those observations is key to the success of the AGDC. Quality information for each observation is represented as a series of bitwise tests which, in the case of Landsat, include: contiguity of observations between layers in the dataset; cloud and cloud shadow obscuration; and a land/sea mask.

Work is currently underway to further develop the open source solution from the initial prototype deployment. Components of the evolution include advancing the system design and function to provide: improved support for additional sensors; improved ingestion support; configurable storage units; provide high performance data structures; graphic user interface implementation and expanded collaboration and engagement.

This paper reviews the history of the data cube and the application areas that will be addressed by the current plan of works. This research was undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government.

[Authors] [ Overview programme] [ Keywords]
Paper 1154 - Session title: Data Intensive Science

13:30 Near real-time focusing of Sentinel-1 IW data with GPGPU

Peternier, Achille (1); Pasquali, Paolo (1); Vitulli, Raffaele (2) 1: sarmap SA, Switzerland; 2: ESA/ESTEC, The Netherlands

Show abstract
New sensors such as Sentinel-1 provide higher resolution, shorter revisit time and high data availability (e.g., through the ESA Open Data Scientific Hub). This scenario brings modern SAR processing under the Big Data category, requiring sophisticated IT infrastructures (both at the hardware and software level) to cope with the increasing demand in terms of storage and computational power.

From a user- and SME-perspective, this leads to the capability of dealing with such a growing complexity by the means of relatively small-sized facilities when supercomputers and datacenters are not affordable/viable. Unfortunately, conventional IT approaches like sequential, single-thread programming on CPUs are no longer enough to mitigate the increasing complexity of the problem.

Thanks to the adoption of General-purpose computing on graphics processing units (GPGPU), a significant boost in computational performance can be achieved by converting and executing the most time-consuming tasks to code that can exploit the massive parallelism provided by GPUs. GPGPU enables what can be considered, de facto, a “personal supercomputer”, but which requires a significant effort in terms of software architecture and code refactoring to unleash its real potential.

In our presentation, we will introduce the SARscape Image Processor Accelerator (SARIPA): a software focuser capable of focusing Sentinel-1 frames in near real-time. SARIPA allows users to efficiently exploit information available through the ESA Scientific Hub by providing them a tool to generate in-house level-1 data for interferometric processing, e.g., to cover areas where only raw and ground-range data is available or to quickly produce SLC files without waiting for their generation through the hub.

Our solution is based on a single-node, high-end server with dual CPU and four GPUs using OpenCL to parallelize the most computationally intense algorithms (such as FFTs and several filters). Thanks to the SARIPA we can focus one Sentinel-1 raw data frame into a level-1 product in about one minute. In the presentation, we will share the experience that we acquired in the development of the SARIPA, ranging from the decoding of raw data to the workload balancing among multiple GPUs. Our results are discussed both from the architectural and algorithmic point of view, reporting the issues we encountered and the strategies that we applied to fix them. Results will also be commented from a performance-analysis perspective, pointing out the major bottlenecks and limitations of conventional architectures with regards to the one we adopted in the SARIPA. For example, input/output operations are executed in parallel with computations to reduce the stress to the I/O backend, while dynamic work-balancing is used to keep all the processing resources busy.

The presentation will focus on the dissemination of the results obtained within the context of the ESA-funded project 3-13467/12/NL/FF/fk.

[Authors] [ Overview programme] [ Keywords]
Paper 2019 - Session title: Data Intensive Science

14:10 Integrating Remote and Social Sensing Data for a Scenario on Secure Societies in a Big Data Platform

Albani, Sergio (1); Lazzarini, Michele (1); Koubarakis, Manolis (2); Papadakis, George (2); Karkaletsis, Vangelis (3) 1: European Union Satellite Centre, Apdo de Correos 511, 28850 Torrejón de Ardoz, Spain; 2: University of Athens, University Campus Ilisia, 15784 Athens, Greece; 3: National Centre for Scientific Research ‘Demokritos’, 15310 Aghia Paraskevi Attikis, Greece

Show abstract
The Societal Challenges described in the EU R&I Programme Horizon 2020 have been established to provide solutions to societal issues identified in different sectors: Health, Food and Agriculture, Energy, Transport, Climate, Social Sciences and Secure Societies. A major activity in supporting the primary aims of the Secure Societies challenge is the provision of geospatial products and services that mainly result from satellite data and convey unprecedented challenges in terms of volume, variety, velocity, veracity and value (the Big Data 5 Vs).

The Horizon 2020 project BigDataEurope (Integrating Big Data, Software & Communities for Addressing Europe’s Societal Challenges) aims at providing support mechanisms for all the major aspects of the data value chain in terms of employed data and technology assets, the participating roles and the established or evolving processes. BigDataEurope is engaging with a diverse range of stakeholder groups representing the Horizon 2020 Societal Challenges in order to implement a Big Data Aggregator Platform; this infrastructure comprises key open-source Big Data technologies for on-line (real-time) and off-line processing to meet the requirements of all Societal Challenges.

Therefore BigDataEurope is performing three main activities: gathering of user requirements from the different communities, development of a technical Big Data platform and implementation of relevant demonstration applications/pilots. The first round of user requirements has been collected by the respective Societal Challenge domain leaders and the selected pilots are currently under development, building an integrated platform that includes the most suitable cross-domain technical solutions.

For the Secure Societies Challenge particular importance has been given to the integration and fusion of data from remote and social sensing in order to add value to the current data exploitation practices; this is very important in the Security domain, where datasets can be composed not only by satellite data but also by data coming from social and other sources.

More specifically, the pilot for the Secure Societies domain considers two different workflows of data. The first workflow automatically selects, downloads, co-registers and processes Sentinel 1 Level-1 GRD (Ground Range Detected) images in order to detect areas with changes on land cover or land use by using change detection techniques; the identified areas of interest are then associated with social media data from Twitter and news items from Reuters and are presented to the end user for cross-validation. For the second workflow, the pilot applies the reverse procedure: event detection is triggered by the news and social media information and the corresponding satellite images are acquired and processed in order to check for changes in land cover/use.

Preliminary results of the pilot implementation are presented. For remote sensing, as a starting point, tools for managing satellite images already available in the Sentinel Scientific Data Hub and the Sentinel 1 Toolbox/SNAP have been used. These tools are currently targeted to expert users and they are only suitable for small-scale (i.e. serial) processing; in the context of the pilot, the functionality of these tools is made fully automated and expressed in a parallelizable and map-reducible form. For social sensing, it has been employed a set of tools for text mining that effectively cluster news items and social media data into events; these data are gathered by specialized techniques for crawling.

For the storage of the end results, it has been used the Semagrow engine that provides a unified view of heterogeneous geo-located items (i.e. satellite images, news items and social media data). The graphical user interface comes from Sextant, a visualization system that is crafted for the exploration of time-evolving geospatial data as well as for the creation, sharing and collaborative editing of temporally-enriched thematic maps.

Additional contents

[Authors] [ Overview programme] [ Keywords]
Paper 2453 - Session title: Data Intensive Science

13:50 Earth Observation and the Web of Data - how can we join the dots?

Blower, Jon University of Reading, United Kingdom

Show abstract
We use the Web every day to access information from all kinds of different sources. But the complexity and diversity of scientific data mean that discovering accessing and interpreting data remains a large challenge to researchers, decision-makers and other users. Different sources of useful information on data, algorithms, instruments, platforms and publications are scattered around the Web. How can we link all these things together to help users to better understand and exploit Earth Observation data? How can we combine Earth Observation data with other relevant data sources, when standards for describing and sharing data vary so widely between communities?

This presentation will describe how techniques of Linked Data (otherwise known as the “Web of Data”) can be used to share all kinds of information, make it readable both by machines and by humans, and ultimately drive the development of the next generation of data-driven applications. Through the use of overarching, community-independent standards from the World Wide Web Consortium, we will demonstrate how interoperability can be achieved between communities.

We will present concrete experiences from two related projects in this area:

In the MELODIES project (Maximising the Exploitation of Linked Open Data In Enterprise and Science) we are developing eight new real-world services, driven by the combination of Earth Observation and other open data sources in a Linked Data framework. These services are aimed at various users in precision farming, shipping logistics, urban planning authorities, public-sector decision-makers and policymakers. They are built upon a common technology platform, providing facilities for data processing, data publication and management of Linked Data.

In the CHARMe project (CHARacterisation of Metadata for high quality applications and services) we developed a system, based around Linked Data, for enabling data users to provide feedback (known as “commentary”) on their experiences with using Earth Observation data. This information is extremely valuable for other data users to decide which datasets they need for their particular purpose, and what the potential strengths and weaknesses of the data are. The use of Linked Data techniques enables this user feedback to be gathered from, and published to, data systems all around the world, providing links between data, publications, algorithms and other relevant information and enabling the user to navigate through a complex web of information.

[Authors] [ Overview programme] [ Keywords]
Paper 2705 - Session title: Data Intensive Science

14:30 A new data cube model for the analysis of multi-temporal, multi-resolution and multi-source remotely sensed imagery.

Lück, Wolfgang Forest Sense, South Africa

Show abstract
With the availability of large archives of multi-temporal, multi-resolution and multi-source remotely sensed imagery from ESA, the USGS and commercial satellite operators, there is a requirement to bring the data into a format that allows the application of advanced quantitative remote sensing techniques. These techniques may entail the combined use imagery from sensors such as Landsat MSS, TM, ETM+, OLI and TIRS, MODIS, MERIS and Sentinel2.For time series analysis, or vegetation characterisation combining optical and SAR imagery.

A new data model is hereby proposed which entails, keeping all imagery in their sensor geometry, mapping individual sensor measurements to an irregular lattice representing ground geometry. Passive remote sensing observations are described in terms of the point spread function, signal to noise ratio, spectral response function per band and radiometric resolution. This allows the inclusion of cross sensor observations for hypertemporal phonological analysis and characterisation of vegetation and other surface types with temporal variability.

The data cube model further lends itself for the automatic cross sensor calibration, BRDF and atmospheric correction, using well calibrated observations from moderate to low spatial resolution sensors for the radiometric correction of high spatial resolution observations from different sensors. Vice versa, spatial detailed captured in high spatial resolution imagery can be used via the segmentation of high resolution imagery to identify clean measurements for homogeneous surfaces from moderate resolution imagery or SAR observations.

This paper describes the data cube model, followed by examples on how it enables the application of state of the art, hyper-temporal, cross sensor calibration, image pre-processing and the complimentary fusion of optical and radar data techniques. A study area in the Kruger National Park, South Africa with corresponding, Landsat, Spot, Pleiades, MODIS, MISR, ALOS, Sentinel1 data is used to demonstrate how this new data cube model can be used for practical and operational quantitative remote sensing techniques.

[Authors] [ Overview programme] [ Keywords]

Download Mobile App

Enjoy the Crowd Cover Classification Experiment