by CMS

Did you ever feel like your analyses are taking a long time and seem to be taking more and more time as the years pass? Well, you are not alone. With increasing computational capacities, everyone is looking at running climate simulations at higher resolution and in more detail. But this comes with a hidden cost: the datasets being analysed are currently growing significantly in size. Unfortunately, traditional analysis methods are not keeping pace. New approaches are required.

This reflects the sentiment from the Collaborative Conference on Computational and Data Intensive Science 2019 (C3DIS 2019) held in May in Canberra, and attended by Claire Carouge and Aidan Heerdegen from the CMS team. This conference focuses on practices used by scientists, and computing, data and information management specialists to carry out complex computations and analyses over massive datasets. This year’s theme was on Understanding the Earth, and as such there were lots of contributions from the geosciences, notably the UK Met Office, the National Center for Atmospheric Research (NCAR) and the Decadal Predictions team at the CSIRO.

A major theme that came back again and again at the conference was the place of cloud computing in the (near) future. All agreed at the conference, cloud computing should play a major role. The UK Met Office is currently moving all its analyses services to the cloud. It appears their current tender for a high-performance computer (HPC) replacement will be the last of its kind as well. We don’t know what the next system to support climate simulations will be but we do know it will not be a traditional HPC. NCAR and the Decadal Predictions team are also moving towards cloud solutions.

What is cloud computing? If you have used the VDI at NCI, you have used some cloud computing. But the VDI is a very simplified version of cloud computing from the user’s point of view as it just looks like a Linux desktop. Everything looks quite familiar. The cloud computing we are discussing here can be quite different.

The file systems are totally unlike a desktop file system. This means parallel access to files is better but users have to learn to use appropriate usage patterns. This could be the easy part, as there is now what is called serverless cloud computing. In traditional cloud computing, you request “a machine” by specifying the computational resources you need. Then you work on this “machine”, run as many programs as you want and when you “log out”, the resources are released. With serverless cloud computing, you do not have a “machine”. You submit programs which encapsulate information on the resources they need. The server (yes, there is still real hardware behind it despite the name) then gives the resources to your program and releases those when the program finishes. This means the resource allocation is more reactive with serverless than traditional cloud. It also means you are only using the resources you need. But it comes at the cost of learning how to encapsulate your program’s resource needs with your program.

The advantages of cloud computing are first and foremost a reduction in data transfers. Instead of working locally and moving the data from a remote location (or accessing the data remotely), you work remotely where the data is initially stored. Data access is then faster. This is already true of systems like the VDI when using data stored at NCI instead of moving the data to your location. Second is scaling. Cloud computing gives access to more computational resources than your desktop and can scale easily, which allows for the analysis of larger datasets. Finally, cloud computing is also cheaper than supporting your own in-house computational resources, especially serverless cloud computing.

A second theme from the conference was the use of parallelisation for data analysis workflows. Until recently, parallelisation was mainly used for model simulations in climate science – they were the only problems large enough to warrant the use of parallelisation. This is not the case anymore as quite a few of the CLEx researchers can attest.

We are often asked by CLEx researchers what expectations they should have for how long their analyses should take. Well, the presenters at C3DIS were unanimous their aim was for analyses to be so short one wouldn’t have to go and do something else while waiting for the results. At most, just the time required to make or drink a coffee. The exact time might vary for each person but we are speaking of less than 30 minutes here. Obviously, we are talking of individual analysis steps here, not a whole workflow! Unfortunately, the steps needed to analyse several terabytes of data under 30 minutes are still complex despite Python packages such as Xarray, Iris and Dask. This means researchers need to pull expertise together more than before around analysis workflows. This is beginning to happen with initiatives like Pangeo and major organisations like NCAR putting resources to directly contribute to Xarray and Dask. This also means it is more important than before for everyone to share their cool analysis workflow. And it is easier to do than you think, just talk to CMS about it.