Pan-European infrastructure for ocean & marine data management

Newsletter

10 June 2020

See other editions

Other editions

Do you want to receive our newsletter? Or do you want to unsubscribe?

Go to the form

Content:

Introduction

SeaDataNet is a major operational infrastructure for managing, indexing and providing access to ocean and marine data sets and data products, acquired by European organisations from research cruises and other observational activities in European coastal marine waters, regional seas and the global ocean. It develops, governs and promotes common standards for metadata and data formats, controlled vocabularies, quality flags, and software tools and services for marine data management, which are widely adopted and used for improving FAIRness (Findable, Accessible, Interoperable and Reusable). SeaDataNet core partners are the National Oceanographic Data Centres (NODCs) and major marine research institutes in Europe. It has established a large European and international network, working closely together with operational oceanography, marine research, and marine environmental monitoring communities as well as with other major marine data management infrastructures. SeaDataNet is a major partner in the development and operation of the European Marine Observation and Data network (EMODnet), aimed at supporting the EU initiatives on Marine Knowledge 2020 and Blue Growth and the Marine Strategy Framework Directive (MSFD). SeaDataNet has a close cooperation and MoU with Copernicus Marine Environmental Monitoring Service (in particular CMEMS-INSTAC). Since the mid-1990s, SeaDataNet has expanded and matured and currently provides federated discovery and access to more than 110 data centres for physics, chemistry, geology, bathymetry, and biology. As part of the EU HORIZON 2020 SeaDataCloud project, SeaDataNet is further developing its discovery, access, ingestion, publishing and visualisation services as well as its widely adopted SeaDataNet standards. The project aims at upgrading and expanding the SeaDataNet architecture and services, making use of cloud services, taking into account requirements from major stakeholders, such as EMODnet, CMEMS, EuroGOOS, EU Directives such as MSFD, MSP, and INSPIRE, international cooperations, such as ICES, IOC-IODE, GOOS, GEOSS, and the European Open Science Cloud (EOSC) challenge. The SeaDataCloud project is implemented in a partnership with EUDAT, a leading European network of academic computing centres, who are well involved in the EOSC developments. The major objectives of the SeaDataCloud project are:

  • Improve discovery and access services for users and data providers
  • Optimise connecting data providers and their data centres and data streams to the infrastructure
  • Improve interoperability with other European and international networks to provide users overview and access to additional data sources
  • Develop a Virtual Research Environment with tools for analyzing data and generating and publishing data products
  • Development, update and publication of data products for European sea regions.
This is the fourth edition of the newsletter in the framework of the SeaDataCloud project. It gives you information about the progress of a number of SeaDataCloud developments such as the launch of the upgraded CDI data discovery and access service, the near finalization of the prototype SeaDataNet Virtual Research Environment, the delivery of the SeaDataNet SWE Toolkit for managing operational oceanographic data streams from input to storage to distribution and visualization, how SeaDataNet is contributing to the Ocean Standards & Best Practices initiative of IOC-IODE, and other topics. We hope you will enjoy this newsletter and will be triggered to visit the SeaDataNet portal for a try-out of its services and to follow its evolution. We aim to reach as many people as possible, so please forward it to anyone you know may be interested.

Upgraded CDI Data Discovery and Access service taken into operation

A major focus of the SeaDataCloud project has been directed towards upgrading the SeaDataNet Common Data Index (CDI) service for discovery and access to a wealth of marine and ocean data. In October 2019, the joint developments of the SeaDataNet network of oceanographic data centres and EUDAT, a leading European network of academic computing centres, have resulted in the launch of a new and innovative version of the CDI service. It replaces the previous version which has been in operation since 2015, while the ur-version was already released in 2007.

 cdi user interface

Image: New dynamic user interface of the upgraded CDI data discovery and access service, giving discovery and access to more than 2.3 million data sets from 110 data centres

Architecture:
The new architecture of the CDI service makes a distinction between the front-end with discovery, shopping and downloading of data sets by users, and the back-end for importing new and updated CDI metadata and related data entries (including versioning) by data centres. The separation is achieved by introducing a central data cloud, which holds copies of all unrestricted data sets by replication from the connected data centres, and which serves as a central data cache for efficiently executing user shopping requests.

New CDI service components 

  • Local software tools at data centres to prepare metadata and data files in SeaDataNet formats and using SeaDataNet controlled vocabularies, from local data resources
  • Replication Manager at data centres for importing CDI metadata to Central CDI catalogue and associated data files to Data Cache cloud, orchestrated by the CDI Import Manager
  • Data Cache cloud with adapted EUDAT services for import, storing, and downloading
  • Upgraded CDI User Interface and facilities for shopping and tracking of shopping requests, by users and data centres

 

architecture cdi service

Table: Architecture of upgraded CDI service


GUI and MySeaDataNet:
The interface is intuitive, but in case of questions an on-line help is available. The Graphical User Interface gives users powerful search options by combining free search, facet search and geographic search options, powered by Elastic Search, SQL search, and Geo Server. The data access function comprises a simple and effective data shopping, tracking and download service mechanism.  All functions for both users and data providers can be reached from a new MySeaDataNet dashboard, depending on the Marine-ID and associated registered functions and roles. As part of this, the shopping process now has an integrated dialogue instead of having to go to separate applications and URLs, for example for searching, registering, checking shopping progress, and retrieving data sets. This makes the dialogues for users and providers much more efficient, and easy to understand and perform. Furthermore, several processes and functionalities have been reviewed and optimised, including performances, which again is in favour of the users and data providers.

 shopping dialogue

Image: Example screen of the shopping dialogue of the upgraded CDI service

For data centres, the MySeaDataNet environment gives access to an online CDI Import Manager service with a dashboard to manage imports of new and updated CDI and data entries. And it gives data centres the option to evaluate and process shopping requests for restricted data sets, which are managed locally and not in the central data cache. The import of metadata and unrestricted data as well as the delivery of restricted data sets, once agreed, are structured as workflows and more self-service oriented as in the current situation. These services work together with the Replication Manager for efficient automatic processing. The Replication Manager replaces the Download Manager, and is easier to install and configure if done by technicians who are well aware of their local infrastructure. Alternatively, the semi-automatic ‘interim solution' is maintained for data centres that are not allowed to install a local component. However, extra functionality is added to the CDI Import Manager for the CDI support desk to mediate for ‘interim' data centres to maintain the CDI catalogue, now both with unrestricted and restricted data sets. This gives ‘interim' data centres the option to relax their data policies and to provide unrestricted data sets. While moving from interim to a situation with Replication Manager is an option which adds to further streamlining the processes.

SeaDataNet role and position in the European marine data landscape:
The CDI service is instrumental for the position and importance of SeaDataNet in the marine and ocean domain. SeaDataNet plays a leading role in the European marine data management landscape, next to and in cooperation with other leading European infrastructures, such as European Marine Observation and Data network (EMODnet) and Copernicus Marine Environmental Monitoring Service (CMEMS). The SeaDataNet network of National Oceanographic Data Centres (NODCs) and data focal points provides essential connections and functionality at national levels in Europe to the many originators of marine observation and analysis data. The more than 100 SeaDataNet national data centre nodes manage, store, quality control, curate, and distribute ocean and marine data sets, for physics, chemistry, geology, biology, and geophysics. The original data are collected during research cruises, national monitoring programs, scientific projects, and other observational activities in European coastal marine waters, regional seas and the global ocean. The originators are more than 750 organisations, such as marine research institutes, monitoring agencies, geological surveys, hydrographic services, universities, and private sector, who use the SeaDataNet data centre nodes for long-term archival and wide distribution.

The SeaDataNet CDI Data Discovery and Access service gives harmonized discovery and access to this large and ever-increasing volume of marine and ocean data sets as handled and managed at the SeaDataNet data centre nodes. The archived and curated data, increasingly are major input for developing added-value services and products that serve users from government, research and industry. Major products are developed by EMODnet lots, CMEMS, and SeaDataNet itself. For instance, EMODnet Chemistry delivers harmonized and validated aggregated data collections for eutrophication, and contaminants for all European sea basins which serve the Marine Strategy Framework Directive (MSFD) and are taken up by Regional sea Conventions, EU DG-Environment and European Environment Agency (EEA) for supporting assessments and deriving indicators. The data as harmonised and bundled by SeaDataNet provides the initial basis and facilitates regular updating of those EMODnet products as new data are becoming available. On its turn, EMODnet stimulates more originators of data, and more data centres to join the SeaDataNet network of data centres and the CDI service for including their original observation and analysis data sets, this way expanding and strengthening the basis under the products and improving their quality. CMEMS deploys pan-European capacity for Ocean Monitoring and Forecasting. SeaDataNet and CMEMS have an MOU in place for mutual exchanges of data, adoption by CMEMS of SeaDataNet standards, and for developing joint products such as climatologies. Next to serving organised communities, the SeaDataNet CDI service also serves individual users, mostly from research sectors. This is further stimulated, inter alia through cooperation with the evolving European Open Science Cloud (EOSC).

Optimising the robustness and performance of the new CDI service:
During the SeaDataCloud developments, major attention was given to the fine-tuning and further integration of the individual system components, deploying various use cases, including upscaling of numbers and volumes of imported and retrieved data files, and monitoring the behaviour and functioning of the processes aiming for operational stability and ruling out interruptions and failures. During the tests, performances were administered and tuning was undertaken to bring performances to an acceptable level. The tuning incorporated arranging more computing resources, such as more CPU's, workers, and memories, as well as making processes faster and more efficient. Also, a lot of activity was undertaken for fine-tuning and completing interfaces and communications for and to data providers and end-users. A lot of interaction was undertaken by organizing two sessions of training for data managers and technicians on common standards, data management procedures, and installing and using upgraded system components and tools. The first training workshop gave technicians and data managers of data centres a preview of the new CDI user interface allowing data centres to test a beta version and to provide feedback about their findings and suggestions. This feedback was great input for further developing and fine-tuning the new CDI user interface. The second training workshop gave data centres more details and hands-on training with the further developed import procedure and how to deploy the Replication Manager at their data centres in order to migrate from the old CDI system with Download Manager to the new CDI system configuration. This gave insights for improving installation and configuration instructions and manuals of software components. This second training workshop was also an important event as part of the process for wider deployment of the new CDI service at each of the more than 100 CDI data centres. During this migration process of several months further feedbacks were provided about bugs and topics for improvement, leading to new releases of the Replication Manager software and upgrading of the services, operated by SeaDataNet partners and EUDAT. Moreover, the system for operational monitoring of the upgraded components of the CDI service and the new components, in particular on the EUDAT platform, needed to be upgraded.

Different environments have been set-up for development, test and production. This applies to the components as given above for the SeaDataNet and EUDAT platforms, while for Data Centres no development environment is included, but only test and production. This separation in environments makes it possible to undertake additional developments, which are moved to the operational test environment for integrated testing, followed by moving to operational production. The test environment at Data Centres serves to install, configure, and test new versions of the Replication Manager, while the old stays in production till it can be replaced.

Future developments:
Further upgrading of the CDI service is undertaken in synergy with the ENVRI-FAIR project. This is in particular aiming at improving the FAIRness of the service, both by enriching metadata and by optimizing machine-to-machine services. For this, SeaDataNet is introducing SPARQL endpoints for all its European directories and common vocabularies, adopting common patterns, following the Linked Data principles. SPARQL is a W3C standard and the query language used to query Resource Data Framework (RDF) data. Linked Data representations of the various SeaDataCloud catalogues will allow ease of interoperability at a global scale. In addition, using these principles and services, the CDI metadata can be enriched automatically by making use of additional metadata from the linked SeaDataNet directories and possible other external sources, such as DOI landing pages for scientific papers. This will be deployed for enriching the discovery and detail pages, while also a CDI API is under development, making full use of the Linked Data model. 

Try it out yourself and customer survey:
You are invited to try out the upgraded service yourself, starting at the CDI service landing page . Discovery and browsing of metadata are public, while a one-time user registration is required for submitting data requests. If you do, we also ask you to complete a short survey, which you can find with a button in the top bar. The survey results will help us to understand how well the service is working and what might need to be improved in the future.

Virtual Research Environment (VRE) prototype nearing finalisation

For researchers, the conventional work using their desktop computer to analyse data and generate products is getting more and more tedious due to the increasing amount of data to process and the complexity of the procedures. Three main hurdles often prevent them from performing the required operations:

  1. Insufficient CPU power, leading to excessively long computation times
  2. Lack of available memory to process large datasets
  3. Insufficient disk storage, making the use of the full datasets impossible.

To overcome these, cloud services are becoming common practice, such as the EUDAT European Research Data Infrastructure or the European Open Science Cloud (EOSC). With such services, no more data downloading is needed: all the processing is performed close to the data. In addition, the cloud allows research on a common platform, with a guarantee that the latest versions of the software tools are installed, and enhances the reproducibility of the scientific work. In order to gain experience with web-based science and to provide the oceanographic community with seamless access to SeaDataNet data and software tools, the SeaDataNet Virtual Research Environment (VRE) prototype has been developed and deployed.

SeaDataNet VRE tools overview:
Users with an account can login to the VRE with their Marine-ID, the SeaDataNet identity provider, and then enter a dashboard with access to a private workspace as well as the different services which are described hereinafter.

webODV consists of a suite of online services based on the Ocean Data View (ODV) software, designed to interactively perform analysis, exploration and visualization of ocean data. webODV allows users to aggregate large numbers of SeaDataNet data files and perform quality control. DIVAnd (Data Interpolating Variational Analysis in n dimensions) is a cutting-edge software tool designed to efficiently interpolate in-situ observations onto a regular grid, in an arbitrary number of dimensions (for instance longitude, depth and time). A set of Jupyter notebooks provides a guideline to the user on how to prepare the data, optimise the analysis parameters and perform the interpolation. BioQC is a tool to process and to run quality control on biological datasets. BioQC helps researchers to evaluate whether a particular biological occurrence record within the input file is useful for their analysis. It also helps the data providers to identify possible gaps and errors in their datasets. The tool returns the input file with quality information attached for each occurrence record and a detailed report. This result file will enable the users to filter for suitable records. VIZ is a modern and dynamic visualisation service to explore datasets on a map. By clicking on data points, the users see a plot of the full profile prepared with WebODV, and metadata of the input ODV files. A time selector makes it possible to limit the data for the period of interest. Additionally, the visualisation service provides the possibility to explore 4D gridded products prepared with DIVAnd.

The Subsetting service based on Erddap strives to make data access easier, by providing services to subset, download and plot data. It returns datasets in various data formats such as CSV, MATLAB, netCDF, ODV and more. An interactive visualization tool allows users to discover and browse through the subset results with modern web technologies.

 components vre

 Image: Different components of the VRE.

How is the Virtual Research Environment deployed?

The VRE's components are deployed on different servers across various EUDAT data centres to distribute the load. The shared central components as well as the various processing services are deployed as Docker containers. These provide a standardized computing environment across all data centres and significantly facilitate the development of the VRE. The services are loosely coupled: the VRE provides central services such as Single Sign-On and central web storage for users, but the services can also run as standalone services. This allows one to easily extend the architecture and integrate heterogeneous services.

The dashboard is the heart of the VRE. Besides providing the main user interface, it is responsible for managing diverse aspects of the VRE. It provides the user interface with access to the different services and to the user's private workspace, an online storage space for users to store their own datasets and the results of their analysis. Additionally, the dashboard performs some tasks behind the scenes: it is responsible for triggering data synchronisation mechanisms and ensures user authorization for the various services by a token-based authorization system. The dashboard is a web application based on the PHP framework Laravel, which includes a user management system and implements state-of-the-art security measures. The authentication/ authorization protocol OAuth is used to provide Single-Sign-On authentication with Marine-ID, based on EUDAT's B2ACCESS, an authentication service used by several European projects and initiatives. The private workspace is an instance of the well-known NextCloud tool, an open source software for cloud-based collaboration. It is customized for the SeaDataNet VRE and runs on its hardware. The file selector is a bridge between most VRE services and the user's private workspace, allowing users to select data from their private workspace to be handed over to the service instances by the back end. Its front end is based on hummingbird-treeview, a JavaScript tool for displaying lists or file/folder structures as hierarchical trees. Services can customize the treeview to their needs.

Two types of processing services:
The actual processing tools - the services - are what researchers use to perform their analyses. Two types of services are deployed in the VRE. Some services operate like traditional web servers, where one central instance serves all users' requests. The BioQC tool, the VIZ-ualisation tool, and webODV are examples for this type. For these services, one Docker Container is running continuously and waiting for users, and the work of different users is kept separate using http sessions. This is the first type of service.

The second type of service is for the more computation intensive tools or those whose backing software is not optimized for separating multiple user's sessions. The latter frequently occurs when porting software that was developed for desktop machines to the cloud-based mode. This type of service requires one instance for every user. As services are packaged as Docker containers, this means that one docker container per user is deployed. JupyterHub is used to spawn the containers upon the user's request. JupyterHub is initially developed to serve Jupyter notebooks, but - slightly diverting from its intended use - it can be used for any tool that is packaged as Docker Container and interacts with the user via HTTP. JupyterHub's benefits include instance management, authentication/authorization, and solid web security measures such as reverse proxying and SSL termination. This solution is used for the services DIVAnd, which is actually run as Jupyter notebooks, and ERDDAP subsetting service.

DIVAnd tool is provided to users through a Docker image that includes the installation of the Julia language (currently version 1.4.0) along with packages such as PyPlot (plotting library), NCDatasets (manipulation of netCDF files); required libraries and tools (e.g. netCDF, unzip, git), the master version of DIVAnd.jl and the master version of the DIVAnd notebooks.

The VRE's subsetting service embeds an instance of Erddap, configured dynamically and started on the fly according to a dataset selected by the user. The service provides a consistent way to create and download subsets of scientific datasets in netCDF format. The subsetting service's Docker image includes the installation of Java and Python for the web services and a web server to run applications (Tomcat), the Erddap web application, a visualization frontend using VueJS framework, plus some further required libraries and tools (e.g. netCDF).

Future perspective

In the final year the SeaDataNet VRE prototype will be further tested by user communities to better suit their needs and to fix performance issues. The VRE prototype will also be brought to operational level and actively monitored. After that it will be released under the SeaDataNet  website for wider user tests with sufficient instruction about the tools the user can expect, and with a request to provide feedback and suggestions for improvement. These suggestions and feedback will be used as a basis for further development in future projects.

Latest developments on SeaDataNet Products

In the framework of the SeaDataCloud project, the WP11 team is mainly dedicated to generating data products such as temperature and salinity data collections and climatologies for the EU marginal seas (Mediterranean, Black Sea, North Atlantic, Baltic Sea, North Sea, Arctic Ocean) and the global ocean. The goal is to provide the best data products from SeaDataNet at regional and global scale and serve diverse user communities (operational oceanography, climate, marine environment, institutional, academia). The regional aggregated data sets for all EU marginal seas contain all temperature and salinity data harvested from the SeaDataNet infrastructure. The data sets are then validated and elaborated by regional experts. The resulting datasets are used to produce the climatologies, gridded fields obtained through DIVAnd mapping tool and representing the climate of the ocean at regional and global scale. In addition to these standard products, new data products are planned by the end of the project, in which the WP11 team will explore a multi-platform and multi-disciplinary approach combining in situ (e.g. gliders, Argo, ships, drifters, fixed platforms) and remote sensed observations, Ocean Monitoring Indicators for tracking ocean mechanisms and/or climate mode's and trends. The WP11 team is making full use of the data sets as available in SeaDataNet and of many of the services and tools which are provided by the SeaDataNet infrastructure for discovery, access, validation, quality control, visualization, interpolation, and gridding of retrieved data sets. This way, the WP11 team is contributing to verifying the quality of the SeaDataNet infrastructure content and services. As such it is also involved in the SeaDataNet VRE development as test team.   

Quality Assurance Strategy:
The SeaDataNet partnership has worked jointly to implement and progressively refine a unique Quality Assurance Strategy (QAS), shown in the image below, aimed at continuously improving the quality of the database content and creating the best data products. The QAS consists of four main phases:

  1. all data and metadata are harvested from the central Common Data Index (CDI) service, using an internal buffer system, configured for specific subsettings;
  2. file and parameters are aggregated to generate a metadata enriched ODV (Ocean Data View) collection;
  3. Quality Check (QC) analysis by regional experts on the regional ODV collections further validates (secondary QC) the data and the Quality Flags assigned by data providers (primary QC);
  4. analysis and correction of the detected data anomalies from NODCs (National Oceanographic Data Centers), which finally updates the corrected data within the infrastructure. The approach is iterative to continuously upgrade the quality of database content and data products.

 Phases 2 and 3 rely on ODV software for the aggregation and quality check of data further developing the the guidelines introduced within SeaDataNet projects. The WP11 team provides also recommendations for the development of ODV software and its webODV version. The phase 3 output consists of validated aggregated data sets for all EU marginal seas in ODV collection format. These data sets are used to compute temperature and salinity climatologies through the DIVAnd software tool which allows to spatially interpolate observations on a regular grid in an optimal way. WP11 team contributes also to the DIVAnd software by an intense testing and debugging activity.

seadatacloud-strategy.png (88.8 K)  

Image: SeaDataCloud Quality Assurance Strategy.

The first version of the aggregated and validated temperature and salinity data collections (SDC_DATA_TS_V1) has been released in June 2018 and they are accessible through the SeaDataNet products catalog together with their Product Information Document (PIDoc) containing all specifications about product's generation, quality assessment and technical details to increase the user's confidence and facilitate products uptake. Digital Object Identifiers (DOI) are assigned to products and relative PIDocs following the linking data approach to foster transparency of the production chain and acknowledge all actors involved from data originators to product generators. An example, the Mediterranean Sea - Temperature and salinity Historical Data Collection and its PIDoc, is displayed below.

access seadatacloud products 

Image: How to access SeaDataCloud products and the relative Product Information Document (PIDoc) from the SeaDataNet products catalogue

The main outcome from the analysis of the SDC_DATA_TS_V1 regional collections were:

  • Data population statistics per sea basin showed a progressive increase of available data (see next image)
  • Data quality also improved thanks to the introduction of additional checks by regional experts
  • QF statistics after QC present very high percentages of good data
  • New metadata statistics about data distributors/originators highlighted some systematic (format, flagging) errors but also made possible to include a fair acknowledgment to all data providers.
  • New instrument type statistics made possible to detect some data and metadata omissions within the SeaDataNet.

 statistics seadatanet progress

Image: Summary statistics of SeaDataNet database progresses from SeaDataNet2 to SeaDataCloud in terms of number of cruises, stations and measurements per sea basin.

The first version of temperature and salinity climatologies (SDC_CLIM_TS_V1) have been released in July 2019.  The regional climatologies were designed with a harmonized initial approach, all covering the time period after 1955 when marine data started to be sufficient for mapping at regional scale and adopted the World Ocean Atlas 2018 vertical standard levels. All regional products are characterized by a monthly climatological field covering the entire time span 1955-2014 at least (some of them reach 2017) and decadal climatologies at seasonal temporal resolution (monthly for the Baltic region) over six decades (1955-1964, 1965-1974, 1975-1984, 1985-1994, 1995-2004, 2005-2017). Moreover, all of them have been created integrating, for the first time, SDC_DATA_TS_V1 aggregated datasets (both restricted and unrestricted) with external sources (World Ocean Database and Coriolis Ocean Dataset for Reanalysis), which highly increased the data coverage.

Also a global climatology (SDC_GLO_CLIM_TS_V1) has been created for the first time which contains two different monthly climatologies for temperature and salinity, SDC_GLO_CLIM_TS_V1_1 (1900-2017) and SDC_GLO_CLIM_TS_V1_2 (2003-2017) with a different time coverage, computed from data from the World Ocean Database (WOD2013). This choice has been taken because spatial coverage of SeaDataNet data at global scale is still too sparse.

The following image contains example plots of all the SDC_CLIM_TS_V1 climatologies, extracted from the relative Product Information Documents (PIDocs).

 plot sdc_clim_ts_v1

plot sdc_clim_ts_v1

Image: Example plots of the SDC_CLIM_TS_V1 climatologies.

A consistency analysis of all SeaDataCloud climatologies versus the World Ocean Atlas 2018  has been performed to demonstrate the differences and the added value of SeaDataNet data products. Both qualitative and quantitative analysis suggested a good consistency of SDN products with WOA2018 and highlighted the added value of regional products compared to the global WOA, having a lowest spatial resolution. Some issues have been identified by regional leaders on the quality of their products and have been reported in the PIDoc section named "Product Usability". Some examples plots from the PIDocs are given below.

 plot sdc_clim_ts_v1 consistency analysis

Figure: Example plots of the consistency analysis performed SDC_CLIM_TS_V1 climatologies.

SDC_CLIM_TS_V1 climatologies have been produced with a big effort of all partners in the WP11 team. Several gridded fields at different spatial and/or temporal resolution have been generated for the first time, all presenting increased vertical resolution. Decadal fields for regional marginal seas have been created for the first time as well.

The major achievements have been:

  • The uptake of the new DIVAnd software, the team work jointly to debug and improve it;
  • The integration of external sources, which included the analysis on how much the data sets overlap or complement each other and the detection of data anomalies and duplicates.

The WP11 team identified possible improvements for the second version of the climatologies which results are expected by the end of the project. Main issues to be addressed are:

  • The optimization of the production chain according to the best solutions developed by the WP11 partners;
  • The optimization of the data integration process;
  • The standardization of the duplicate detection process with external data sources.

 A second version of the aggregated datasets (SDC_DATA_TS_V2) is going to be released very soon in the SeaDataNet products catalogue, while the new climatologies (SDC_CLIM_TS_V2) will become available by the end of the project.

SeaDataNet contributing to Ocean Data Standards and Best Practices

In 2013, the IODE Programme of IOC/UNESCO at its 23rd Session through the Recommendation IODE-XXII.6 and in cooperation with the WMO-IOC Joint Technical Commission for Oceanography and Marine Meteorology (JCOMM) established the "Ocean Data Standards and Best Practices" (ODSBP) Project. The aim of the project was to achieve broad agreement and commitment to adopt a number of standards and best practices related to ocean data management and exchange and to facilitate interoperability. The ODSBP Project was the continuation of the "Ocean Data Standards" (ODS) Pilot Project, also established and implemented jointly between IODE and JCOMM. The ODSBP Project actually extended the activities of its predecessor to include dissemination and promotion of "best practices", in addition to "standards". The proposed standards through a review process, which include experts and community review, when accepted, are recommended for wide use and are published in the IOC Manuals and Guides series (see Recommended Standards.

For the dissemination of the best practices, in 2014, IODE established a document repository, the "OceanDataPractices" repository (ODPr) (Recommendation IODE-XXII.19). The repository contained a wide variety of "practices" such as manuals and guides related to oceanographic data and information management. It was aiming to provide a platform for organizations to work on common standards and avoid duplication, to allow individual researchers from all around the world to find and follow practices approved by specialized expert bodies and organizations, and to enable research groups that wish to start a new research project and want to prepare a data management plan to search and find existing methodologies and "best practices". The service was also intended for users to submit their own documents that they wish to share with the community. ODPr was a joint effort by several participating institutions, which include ICES, IOC/IODE, JCOMM, SCOR and WMO.

The communication and use of "best practices" by the marine community is a challenging process. There are diverse oceanographic data that address multiple needs and the way these data are being collected, managed, documented and distributed may be "best" for some users but these specifications may not be the best solution for others. The need for a FAIR global and sustainable ocean best practice management was recognized by several national and international organizations and projects. The EU AtlantOS Project and its Best Practices Working Group with the support of ODIP the (Ocean Data Interoperability Platform) Project and the NFS Ocean Observation Research Coordination Network) developed the Ocean Best Practice System (OBPS) concept. The long-term objective of the Ocean Best Practices System (OBPS) is to provide the ocean research, observing and application communities with a mechanism to discover, review, agree upon, adopt and support the widest possible dissemination of ocean best practices. The existing IOC/UNESCO IODE "OceanDataPractices" repository (ODPr) was identified as a permanent, sustainable repository and in 2017 its name changed to "OceanBestPractices" repository to reflect the broader spectrum of "all ocean-related" best practices. The new System development was centered around this repository that is addressing the best practices management challenges and it is today one of its elements. In 2019, the OBPS was adopted by the IOC as an international project co-sponsored by the Global Ocean Observing System (GOOS) and the International Ocean Data and Information Exchange (IODE). The new System includes the following elements: 1) a permanent repository, the "OceanBestPracticesSystem Repository" (OBPS-R) that hosted by IODE offering the scientific community a platform to publish their ocean-related best practices and find practices of others using innovative search and access technology, 2) a peer review Research Topic Journal in Frontiers in Marine Science, 3) web-based and in-person Training and Capacity Building capability based on the IODE OceanTeacher Global Academy, 4) community outreach and engagement activities to help users integrate OBPS into their routine work.

SeaDataNet contributions:

SeaDataNet is actively contributing since the early steps of the initiative and international activities for the development of "standards" and "best practices". In 2014, four SeaDataNet standards proposals were submitted to the ODSBP Project:

  • SeaDataNet Cruise Summary Report (CSR) Data Model proposal
  • SeaDataNet Cruise Summary Report (CSR) XML Encoding proposal
  • SeaDataNet Common Data Index (CDI) Data Model proposal
  • SeaDataNet Common Data Index (CDI) XML Encoding proposal

The scope of the proposals was to acknowledge the SeaDataNet CDI and the Cruise Summary Reports (CSR) metadata profiles as standard metadata models for the documentation of Marine & Oceanographic Datasets and Cruise Reports as well as their XML encoding as the reference XML implementation. In particular, the proposals were aiming to promote CDI&CSR as regional (i.e. European) standards.

In 2015, two joint proposals by SeaDataNet and the ODIP projects were also submitted to the IODE Standards Process:

  • SeaDataNet Controlled Vocabularies, and
  • SeaDataNet NetCDF (CF) data format,

aiming to acknowledge SeaDataNet Controlled Vocabularies and NetCDF (CF) as standards used in metadata and data formats descriptions and as data transport model for processing and sharing Marine and Oceanographic Datasets and promote them as regional (i.e. European) standards.

The controlled vocabularies have been adopted and published as a recommended standard and in the IOC Manual and Guides No. 54 Volume and at the Ocean Data Standards site. The review for the CDI, CSR metadata models and XML schema following the expert comments has been finalized by the developers of the proposals including the updating of the technical reference documentation. The next step of the process is the community review.

As the NetCDF (CF) data format elements were intended for a "regional" rather than a global standard, the document was submitted as a "best practice" to the "OceanBestPracticesSystem Repository" (OBPS-R). In 2019, SeaDataNet submitted an additional ten documents to the repository (including its standards) describing its data management manuals and protocols and its data and metadata format specifications:

  • Flow Cytometry data: format and examples of SeaDataNet ODV data and CDI xml metadata files.
  • Guidelines and forms for gathering marine litter data. [Updated version: 26/03/2019].
  • Ingesting, validating, long-term storage and access of Flow Cytometer data.
  • Ocean Data Standards, Vol.4: SeaDataNet Controlled Vocabularies for describing Marine and Oceanographic Datasets - A joint Proposal by SeaDataNet and ODIP projects.
  • Proposal for gathering and managing data sets on marine micro-litter on a European scale. [Updated version: 07/06/2019].
  • Proposal for gathering and managing data sets on marine micro-litter on a European scale. [Updated version: 19/04/2019] [Superseded by DOI: http://dx.doi.org/10.25607/OBP-495].
  • SeaDataNet Cruise Summary Report (CSR) metadata profile of ISO 19115-2 - XML encoding, Version 4.0.0.
  • SeaDataNet Cruise Summary Report (CSR) metadata profile, Version 4.0.0.
  • SeaDataNet metadata profile of ISO 19115, Version 11.0.0.
  • SeaDataNet metadata profile of ISO 19115-XML encoding, Version 11.0.0.
  • SeaDataNet NetCDF format definition. Version 1.21.

The SeaDataNet community "best practices" can be found at the following website address.

SeaDataNet SWE Toolkit for receiving, managing, and streaming autonomous observatory data

When dealing with (in-situ) observation data, there is a very large amount of different sensor data encodings, data models as well as interfaces. This heterogeneity makes the integration of sensor data a very cumbersome task. For example, without a common standardised approach, it would be necessary to customise each application that shall consume sensor data to the individual data formats and interfaces of all sensing devices that will deliver data.

To address this issue, the Open Geospatial Consortium (OGC), an international de-facto standardisation organisation in the field of spatial information infrastructures, has developed the Sensor Web Enablement (SWE) framework of standards. The OGC SWE architecture comprises several specifications facilitating the sharing of observation data and metadata via the Web. Important building blocks are standards for observation data models (Observations and Measurements, O&M), for the corresponding metadata about measurement processes (Sensor Model Language, SensorML), and interfaces for providing sensor-related functionality (e.g. data access) via the World Wide Web (Sensor Observation Service, SOS).

 no-standards_swe-standards.png (48.8 K)

Image: Left: no standards used, requiring individual integrating of all data sources; Right: reducing the integration efforts by using SWE standards, common interfaces and encodings

SeaDataNet strives for a common standardised approach for describing and giving discovery and access to marine data from different marine disciplines. Next to delayed mode data sets by means of the CDI data discovery and access service, standardising efforts are also directed towards (near) real-time data streams as collected by operational sensors and platforms. For this application, the SeaDataCloud project has built upon these SWE standards to support the interoperable sharing of (near) real-time observation data streams. A SeaDataCloud team, led by partner 52°North, has developed the open source SeaDataNet SWE Toolkit which comprises the following components:

  • SWE Ingestion Service: this component supports sensor operators to receive and ingest marine observation data from platforms and sensors into a local storage database. From there (selected) data can be published as streams of (near) real-time observation data by means of SOS servers. As first step after installing the Ingestion service, the structure of the data stream has to be described in the local database, specifying platforms and sensors with SWE metadata profiles, supported by SeaDataNet SWE vocabularies, and using the 52°North SMLE editor;
  • SWE Viewing Services: This component, which is based on the 52°North Helgoland Sensor Web Viewer, is an application for exploring and visualising the data streams as retrieved through the SOS services. The viewer supports different types of observation data. It is capable to visualise data measured along trajectories (e.g. by research vessels) as well as profile data, besides time series data showing the historic variations of one or more parameters at fixed locations (e.g. fixed buoys and sensor stations).

 components-swe-toolkit.jpg (36.2 K)

Image: Components of SeaDataNet SWE toolkit deployed in a possible configuration

At the SeaDataNet portal homepage a link is given to an SWE demonstrator of the deployed SeaDataNet SWE Toolkit.    

 sensor-web-viewer.jpg (33.9 K)

Image: Sensor Web Viewer demo at the SeaDataNet home page

All of the software in the SeaDataNet SWE Toolkit is published as open source software. To assist uptake by operators of observing platforms, the Demonstrator also includes a web page where all relevant documentation and GitHub resources can be found.

Latest status of SeaDataNet tools

SeaDataNet has developed and maintains a set of tools to be used by each data centre and freely available from the SeaDataNet portal.  It includes documentation and common software tools for metadata and data, statistical analysis and grid interpolation and a versatile software package for data analysis, QA-QC and presentation. As part of the SeaDataCloud project upgrades are undertaken taking into account new requirements. The following software versions are current:  

MIKADO, developed by IFREMER, is used to generate the XML metadata entries for CDI, CSR, EDMED, EDMERP and EDIOS SeaDataNet catalogues. The latest version (3.6.2) has been released in May 2020 with major updates: the CDI ISO 19139 format has been updated to v12.2.0 for INSPIRE compliance and horizontal datum is now mandatory (var03); CSR ISO 19139 formats are updated to v5.2.1 for INSPIRE compliance. Protocol changes from http to https for BODC webservices, SeaDataNet schemas, URN resolver. CDI and CSR publications are now free links and allow to link e.g. by DOIs.

NEMO, developed by IFREMER, enables conversion of ASCII files of vertical profiles, time series or trajectories to SeaDataNet format files which can be either text files at Ocean Data View (ODV) and MedAtlas formats or binary files at NetCDF format. The latest version (1.7.0) has been released in April 2020 with several bug fixes and including support for additional formats specifically designed for Biological, Microlitter and Flow cytometry data. Note that NEMO 1.7.0 and further versions generate files which can only be managed with RM version >= 1.0.45.

EndsAndBends, developed by IFREMER, is used to generate spatial objects from raw navigation (ship routes). Typical navigation log files record more than one location / 10 seconds (ex: GPS outputs) and the Size of these navigation log files are not practical to be managed or visualized using standard GIS software or services (WMS, WFS and GML). EndsAndBends enables a sub-setting of the navigation files, keeping the same geographical shape of the vessel route and reducing significantly the number of geographical locations to preserve response time. The latest version of EndsAndBends (2.1.0) has been released on April 2014; since then no new version has been developed.

OCTOPUS, developed by IFREMER, is a SeaDataNet format conversion software and also a format checker for SeaDataNet ODV, netCDF and MedAtlas files. The latest version (1.5.3.0) has been released in April 2020 with several bug fixes and additional functionality: Update URLs for CDIs, CSRs and NERC vocabularies; allow DPSF P09 parameter as depth below sea surface in MedAtlas files (e.g. core data); allow exception for ODV files with empty vertical reference (e.g. navigation, contaminants in biota files); conversion MGD to ODV: Change the order of the depth parameter. All data providers are strongly advised to use OCTOPUS to check their ODV, netCDF and MedAtlas files before initiating submissions in order to stick carefully to the agreed formats and rules. OCTOPUS includes already more than 500 checks. When using this version of Octopus, also the latest MIKADO versions should be used. Files generated with OCTOPUS 1.5.3 and further versions are only compatible with Replication Manager >= 1.0.45.

Replication Manager (RM), developed by IFREMER, handles the replication of the local data sets managed in a SeaDataNet data centre into the central cloud. The most recent version of the replication manager (1.0.45) has been released in April 2020 for the connection of all data centres. The Replication Manager is a web application which allows data providers to submit new and updated CDI files to the new CDI import manager at MARIS and at the same time to publish the corresponding unrestricted data files in the cloud for user downloading. The restricted data files are kept locally, in a specific directory, for authorised downloading. In the latest release several fixes have been made for improving the import and the retrieval of restricted data requests. All data providers are strongly advised to upgrade to this latest version for their connections to the CDI service and in dialogue with cdi-support@maris.nl.   

Ocean Data View (ODV), developed by AWI, provides interactive exploration, analysis and visualization of oceanographic and other geo-referenced profile or sequence data. It is available for all major computer platforms and currently has more than 70,000 registered users. ODV has a very rich set of interactive capabilities and supports a very wide range of plot types. This makes ODV ideal for visual and automated quality control. The latest release is Version 5.3.0, released in June 2020. The ODV software is also being used for producing generic SeaDataNet data products - see the article on SeaDataNet data products.  Moreover, an online version webODV has been developed in the framework of the development of the SeaDataNet Virtual Research Environment (VRE) - see the separate article on the VRE. 

The Data-Interpolating Variational Analysis (DIVA) tool, developed by ULg, allows to spatially interpolate (or analyse) observations on a regular grid in an optimal way. As a part of SeaDataNet, the DIVA method has been integrated into ODV. Features supported by the ODV/DIVA integration include proper treatment of domain separation due to land masses and undersea ridges or seamounts and the realistic estimation of water mass properties on both sides of the divides. This is important in areas, such as the Kattegat, with many islands separated by narrow channels. An online version of DIVA has been developed in the framework of the SeaDataNet Virtual Research Environment (VRE) - see the separate article on the VRE. 

All tools are freely made available for downloading through the SeaDataNet portal together with manuals for installation, configuration and use.

Progress with population of SeaDataNet directories

The SeaDataNet infrastructure comprises a network of interconnected data centres that perform marine data management at national and local levels and that together make their information and data resources discoverable and accessible in a harmonized way. The SeaDataNet directory services provide overviews of marine organisations in Europe, and their engagement in marine research projects, managing large datasets, and data acquisition by research vessels and monitoring programmes for the European seas and global oceans:   

  • European Directory of Marine Organisations (EDMO) (> 4.250 entries)
  • European Directory of Marine Environmental Data (EDMED) (> 4.200 entries)
  • European Directory of Marine Environmental Research Projects (EDMERP) (> 3.200 entries)
  • European Directory of Cruise Summary Reports (CSR) (> 58.300 entries)
  • European Directory of the Ocean Observing Systems (EDIOS) (> 350 programmes and > 16.500 series entries)
  • Common Data Index Data Discovery and Access service (CDI) (>2.39 million entries)

progress-directories-2012.png (55.1 K)

Image: the monthly progress of each of the directories since September 2012.  

coverage-cdi-data.jpg (92.8 K)

Image: the coverage of the CDI data service with > 2.39 million entries from 110 data centres and > 750 data originators.

Users can follow this monthly progress at the SeaDataNet portal.

Acronyms as used in this Newsletter

This newsletter contains many acronyms which are described in the following list:  

API: Application Programming Interface
AtlantOS: Optimising and Enhancing the Integrated Atlantic Ocean Observing Systems, an EU/H2020 Project
CDI: Common Data Index
CF: Climate and Forecast
CMEMS: Copernicus Marine Environmental Monitoring Service
CSR: Cruise Summary Reports
CSV: Comma Separated Values
CS-W: Catalogue Service for the Web
DIVA: Data-Interpolating Variational Analysis software
DOI: Digital Object Identifier
DTM: Digital Terrain Model
EDIOS: European Directory of Oceanographic Observing Systems
EDMED:  European Directory of Marine Environmental Data
EDMERP: European Directory of Marine Environmental Research Projects
EDMO: European Directory of Marine Organisations
EMODnet: European Marine Observation and Data Network
EOSC: European Open Science Cloud
EuroGOOS: European Global Ocean Observing System
FAIR: Findable, Accessible, Interoperable, and Reusable
GEBCO: General Bathymetric Chart of the Oceans
GEOSS: Global Earth Observation System of Systems
GOOS: Global Ocean Observing System
HFR: High Frequency Radar
HPC: High Performance Computing
ICES: International Council for the Exploration of the Sea
ICT: Information and Communication Technologies
IMDIS: International Conference on Marine Data and Information Systems
IOC: Intergovernmental Oceanographic Commission
IODE: International Oceanographic Data and Information Exchange
JCOMM: Joint Technical Commission for Oceanography and Marine Meteorology (now Joint WMO-IOC Collaborative Board)
ISO: International Organization for Standardization
JSON: Java Script Object Notation
MSFD: Marine Strategy Framework Directive
NetCDF: Network Common Data Form
NFS: National Science Foundation (USA)
NODC: National Oceanographic Data Centre
NRT: Near Real Time
NVS: NERC Vocabulary Services
OBIS: Ocean Biogeographic Information System
ODIP: Ocean Data Interoperability Platform (EU-FP7 and EU-H2020 Project)
ODV: Ocean Data View software
ODSBP: Ocean Data Standards and Best Practices project
OGC: Open Geospatial Consortium
QA: Quality Assurance
QC: Quality Control
RDA: Research Data Alliance
RSM: Request Status Manager
RTD: Research and Technological Development
SCOR: Scientific Committee on Oceanic Research
SDB: Satellite Derived Bathymetry
SDC: SeaDataCloud
SDN: SeaDataNet
UNESCO: United Nations Educational, Scientific and Cultural Organization
URL: Universal Resource Locator
VRE: Virtual Research Environment
W3C: World Wide Web Consortium
WCS: Web Coverage Service
WebODV: online version of Ocean Data View software
WFS: Web Feature Service
WMO: World Meteorological Organization
WMS: Web Map Service
XML: Extensible Markup Language