Destination Earth - The Explainer Series
THE DIGITAL TWIN ENGINE
Destination Earth (DestinE), the EU initiative to create a digital replica of Earth, will provide a new information system including many innovative features. This system will boost Europe’s efforts to respond and adapt to the environmental challenges posed by climate change and extreme events in support of its Green Deal.
To achieve its aims, DestinE builds on decades of European investment and expertise, harnesses the latest advances in science, digital technologies and machine learning, and exploits the world-leading supercomputing facilities of the European High Performance Computing Joint Undertaking (EuroHPC). As such, the resources and components comprising the DestinE system originate from very diverse environments.
One specific element playing a critical role in ensuring all these components operate smoothly together is the Digital Twin engine (DTE). It deploys a series of software components and services at the different levels of the DestinE system, thus playing several essential roles.
"The DTE is the 'glue' that orchestrates the digital twins through a unified approach and offers novel data handling and interactivity capabilities so that users can fully exploit DestinE's potential."
Tiago Quintino, Head of Development Section, ECMWF
The DTE within DestinE
The components of DestinE's major building blocks are: the Digital Twins of the Earth system, the Digital Twin Engine (both developed by ECMWF), the Core Service Platform (ESA) and the Data Lake (EUMETSAT).
All of these include workflows, data handling capabilities and services that help exploit the most advanced accelerator hardware available on European supercomputers. These building blocks support the concerted efforts in highly accurate physical and machine learning driven modelling underpinning the digital twins.
See more in our explainer on the Digital Twins here.
An innovative, modular and performant software framework was specifically designed to create a unified orchestration of the digital twins and to allow exploitation of their data.
The Digital Twin Engine allows efficient deployment and running of the digital twins on the EuroHPC supercomputing systems, to handle the multiple workflows and data management, and to enable different levels of interactivity for the users. It thus operates at multiple levels of the system, allowing different components of DestinE to become interoperable through this unified orchestration.
See more in our explainer on Supercomputers here.
Roles
The Digital Twin Engine's mission can be boiled down to five specific functions.
Tools
The DTE is composed of a series of digital tools, in the form of software, services and APIs (Application Programming Interfaces) developed at ECMWF in collaboration with many partners. All software is fully open source and co-developed with the community. This growing list of components can be put together to adapt to different situations and the evolving digital landscape.
A framework for Earth System Model Workflows
The DTE deploys these tools in different areas of the DestinE system, allowing them to operate and serve their different missions.
As DestinE is an agile and bespoke system, a wide variety of needs can be addressed, from performing multi-decadal climate projections to requesting digital twins' data over a specific region.
By activating some or all of its tools, the Digital Twin Engine allows orchestration of several important components of DestinE.
I. PERFORMANCE
DestinE's digital twins will perform simulations at very high resolutions to represent processes more directly, and this will require an increased amount of computing resources to perform the needed calculations. For this reason, the new generation of 'pre-exascale supercomputing' (with performance of one million teraFLOPS), enters into the picture.
An agreement with the European High Performance Computing Joint Undertaking (EuroHPC) has granted DestinE special access to some of the most powerful supercomputers in Europe: LUMI based in Finland (fifth in the TOP500 list of the world's fastest supercomputers), Leonardo in Italy (sixth in the TOP500), MareNostrum5 in Barcelona (eighth in the TOP500), and MeluXina in Luxembourg (71st in the TOP500).
These new installations offer crucial resources for computing and big data production at scale, allowing the performance of complex digital twin simulations at unprecedented spatial resolutions. But this comes with its share of challenges as each of these new generation of EuroHPC supercomputers has its own specific architecture and a different geographical location. Tackling these challenges is one of the Digital Twin Engine's primary tasks.
Adapting codes
To enhance their computing power, these new supercomputers include novel processor architectures and components. They entail novel accelerator technologies, such as graphics processing units (GPUs), to speed up compute-intensive operations while improving energy efficiency.
So far, weather and climate models have relied on traditional central processing units (CPUs). The scale of the DestinE's digital twins requires continuous adaptation, porting and highly targeted optimisation of the existing codes to make full use of current and future accelerator technologies.
The Digital Twin Engine supports this effort by providing sustainable solutions for porting and optimising digital twin codes and maximises their performance on each EuroHPC platform.
Loki: for Performance Portability
For this specific purpose, the DTE utilises Loki's functionalities, as part of a toolset deployed within the DestinE system.
Loki is a programmable source-to-source translation package written in Python (a high-level scripting language) to allow HPC experts to adapt large amounts of Fortran (a 'low-level compiled language') automatically.
It provides many functionalities to analyse and transform source code that allow users to build highly customisable code transformations and pipelines. It is freely programmable by experts in the field who can extend it with new recipes such as adaptations to particular accelerators.
II. INTEGRATION
Using the power of these different supercomputers, DestinE's digital twins will produce high volumes of data. But with the Euro HPC installations located in different areas, and the technical novelties of the digital twins, the data sources are diversifying, with important consequences for the way the data movement is controlled and coordinated. Traditional workflow management systems need to adapt to fit this new environment.
With DestinE, we move towards a decentralised configuration with separate, geographically distributed installations. We refer to this as a "decentralised" environment, just like modern source code management using git is decentralised.
There is a need for solutions to minimise this movement and encourage data centric workflows. This means redefining new workflows and data management systems in a decentralised environment.
The Digital Twin Engine tackles this new challenge by defining the language and providing common components to establish a complete and robust end-to-end workflow, from Earth observations and models to creation of products and delivery to end-users at scale. The Engine orchestrates the digital twins, their simulations and data, and their interactions with users.
Data Bridges
To facilitate interaction with the digital twins and their data, elements of the Digital Twin Engine are also deployed on geographically distributed data bridges near the EuroHPC supercomputing sites. They are part of the DestinE Data Lake and serve data to users in the DestinE Core Platform.
These are auxiliary DestinE edge infrastructures deployed next to an HPC to facilitate the access to the data produced by the digital twins and enable interactivity. As most supercomputers differ from each other, the data bridges 'bridge' to users, and provide a common interface across sites that simplifies the specific environment of each supercomputer for the external DestinE user seeking a seamless and transparent experience.
DTE services for integration
To integrate DestinE's decentralised elements and develop end-to-end workflows, the DTE deploys a variety of services. For example the Fields DataBase (FDB) is a domain-specific object storage that efficiently stores and indexes the digital twin outputs according to semantic metadata; Aviso is a Data Availability Notification Service, which is designed to notify users when digital twin data is available; or ecFlow, a workflow manager package, which is especially designed for large-scale operational and time-critical workflows. Other workflow orchestration tools such as Autosubmit can be integrated and are employed within the defined DestinE environment to control the Climate Change Adaptation Digital Twin production.
III. DATA HANDLING
Once assembled, DestinE’s digital twins describe our planet’s processes with high accuracy, producing vast amounts of data at scales where impacts of climate change and extreme events are felt. The amount of data produced in the context of DestinE increases exponentially and it is estimated to reach an unprecedented throughput of 1 PB per day.
How to enable users to easily retrieve the 'needle in the haystack' they typically need among such large data volumes?
The Digital Twin Engine provides the supporting services to handle this big data stream efficiently. For this, it employs novel algorithms to allow users to find their required information within this large data stream.
Polytope, a revolutionary tool
In addition to the previously mentioned FDB service, the DTE also deploys additional software to achieve this specific mission: Polytope, a Feature Extraction for Improved Access to Petabyte-Scale Datacubes, which implements the latest standards in data accessibility.
Currently, when users want to request a specific set of data for a certain region, such as spatio-temporal data or a time-series, their requests come in as 'bounding boxes' from all the data available. These boxes may be rectangular or unspecific and include a large amount of information the user does not necessarily need. With higher-resolution simulations, and the increasing data amounts produced, this poses limits and would be a barrier to smooth, rapid and agile use of the system.
A polytope is a mathematical name for an 'n-dimensional polygon'. This gives name to this specific service and is used to represent the 'region' of interest of the user in the multidimensional data space, addressing their specific need and providing a tailor-made data-request beyond conventional limits.
Current data set extraction systems
Polytope data extraction system
This allows the Polytope service to extract only the information users need. By improving the efficiency of this data extraction system, unnecessary manipulations are avoided, leading to time and energy savings. Polytope will integrate the different FDB systems, so that all DestinE digital twins’ data will be accessible through Polytope.
This is extremely useful for many different cases. For example, users may require data with irregular shapes, such as islands or coastal areas,
or data along paths such as shipping or flight routes.
IV. INTERACTIVITY
The Digital Twin Engine also brings new ways to interact with the workflows, the data and ultimately the simulations. These features will gradually be augmented through the different phases of DestinE, and ultimately will empower users to interact with DestinE's digital twins in multiple ways.
Direct access to data stream
The first aspect of this role is to allow users to interact with the data stream as it is being generated.
This can serve many sectors to act more sustainably. For example, offshore wind farm operators will be able to capture specific information related to potential wind storms and adapt their activities accordingly. In these cases, the Digital Twin Engine allows the DestinE system to issue notifications that the user may employ to trigger their own analytics. This makes a direct link from digital twin data to the information required by an impact-based user's decision.
Fusing complementary information sources
Users can also have their own models integrating with the digital twins' workflow.
Taking the same example as above, the same wind farm operator could insert their own model within the system and estimate the power generated by its offshore wind farm. As such, and by incorporating their own algorithms and applications to intersect with the digital twins' data, the user becomes an active part of the product pipeline.
Flexibility
Finally, the Digital Twin Engine facilitates flexible exploitation of data and computing power across the DestinE blocks. Different users will be able to simultaneously connect to different parts of the system, depending on their needs and application readiness, accessing data stored at different locations within the distributed components of DestinE.
V. ENABLING Machine Learning
The Digital Twin Engine is also key to supporting the application of Machine Learning (ML) techniques in the DestinE system to complement and enhance DestinE's features.
In particular it supports developing workflows towards a foundation Earth-system model, which will complement the physically based Earth system simulations to quantify uncertainties in the digital twins and enhance interactivity.
The Digital Twin Engine empowers the Machine Learning to scale up, as it requires repeated and intense access to high-quality data and extensive computing resources to train from data.
Once trained, machine learning workflows can provide enhanced interactivity by inference, allowing users to run ML models and generate bespoke new data.
The Digital Twin Engine supports the deployment of ML workflows on EuroHPC supercomputers and helps preparation of relevant digital twin data sets for this process to be more efficient. This automates the learning process based on DestinE data.