Efficiency in the great big data cloud
Every minute, video material amounting to roughly 400 hours of viewing time is uploaded to YouTube. Users watch one billion hours’ worth of videos every single day, according to the channel’s own statistics. This data traffic between the YouTube “cloud” and the terminal devices, more than half of which are mobile devices, requires efficient organisation. The computer scientist Radu Prodan specialises in the efficiency aspects of these distributed and parallel systems. In the following interview, he discusses the possibilities and impossibilities that still lie ahead and that present enormous challenges for technology, humankind, and nature.
In June, the fastest supercomputer commenced operation in the USA, boasting a performance capacity of 200 petaflops (one quadrillion floating point operations per second). You examine how well these vast systems function. What challenges do you perceive?
These computers consist of millions of heterogeneous components, combined in a highly complex and hierarchical structure. The efficient programming of supercomputers of this vast size represents a considerable challenge, as the necessary communication between the components, the synchronisation, and many other inevitable overheads tend to result in inefficiency. Many of these computers have an actual utilization of only 15 to 20 per cent compared to the peak performance they could theoretically achieve. This translates into a performance problem for not only numerous applications, but also for the operator, who is not making full use of computing infrastructure that was acquired at great cost.
What is the main problem?
In many cases, it is the inability to reach the theoretical peak performance. Supercomputers are steadily growing in size in the attempt to achieve speeds above one quadrillion operations per second. If you gain mass, you sacrifice velocity, as happens when the communication between the individual components is too frequent or involves excessively high data volumes. “Extreme data” is the byword of the big data concept, which is characterised by vast volumes of data that need to be retrieved, communicated and analysed at a speed approximating real time. This is the issue we are currently addressing in the context of a new European research project.
Why is this necessary?
Nowadays, there are many scientific and commercial applications that need to generate, store, filter, and analyse data at a rate of hundreds of gigabits per second. One contemporary example is the simultaneous analysis of millions of images each day, which involves a real-time database scan of one billion social data posts. Conventional hard drives and commercial storage systems are not up to this task. What we are trying to do, is to improve existing concepts and technologies, and our particular focus is on data-intensive applications running on systems that consist of millions of computing elements. These are the so-called exascale computing systems, which can manage one quadrillion operations per second.
What is your contribution and what will the new project contribute to improving these exascale systems?
We aim to develop new programming paradigms, interfaces, runtime tools, and methods to efficiently deploy data-intensive tasks on exascale systems, which will pave the way for the future utilisation of massive parallelism through a simplified model of the system architecture. This should facilitate advanced performance and efficiency, and provide powerful operations and mechanisms to facilitate the processing of extremely large data sources at high speed and/ or in real time.
Many of the numbers mentioned sound like superlatives. Nonetheless, it seems that we are still far from achieving genuine superlatives in terms of the demand for computing capacity. What is your view?
Gordon Moore, co-founder of Intel, formulated Moore’s Law in 1965, which essentially states that the speed of computing systems will double every 18 months. This law still holds true today. Certainly, the rates of increase are not nearly adequate to cope with the growing volume of data. There are estimates that claim that each human being in the world will be generating around 1.5 to 2 megabytes per second by 2020. We can neither store nor process such vast volumes of data. That is why it is important to interpret and filter the data in such a way that only the important information is used for further processing.
Which approaches might be able to deliver a solution?
That depends very much on the application. The trend today is towards edge computing. On the one side, we have a vast cloud, where numerous parallel computers form a common unit, which processes data in a centralised manner. We now know that the cloud is not sufficient to cope with the sheer volume of data. The distance between the terminal device and the server farm – which might be located on the other side of the world – leads to latency issues. Even though it may only be a matter of a few milliseconds, humans are very sensitive when it comes to having to wait for data to be retrieved. This is especially critical in the case of highly-interactive user interfaces such as computer games. It is important that we manage to bring the cloud closer to the end user, which means processing the data at the edge of the Internet. That is the fundamental idea of edge computing.
Can you describe how this might work in practice?
People dynamically and adaptively plug a small network-compatible computer into an electrical socket in the vicinity of the application. This provides every individual with their own small “cloud”, which manages the data and permits significantly faster communication, at a speed that is much closer to real time. The management of such distributed edge/ cloud computers still needs a lot of development effort to ensure that they are automated, transparent, adaptive, and flexible. Still, we can expect to see them on the market in the next few years.
Does this mean that the advantages of the cloud are lost, i.e. the unlimited resources, access to data anytime and anywhere, scalability, outsourcing and with it the confidence that the data are securely stored elsewhere?
No, not at all, because it basically supplements the existing cloud. Security will certainly remain a big issue, as long as our data are stored on an ever-increasing number of third-party devices, particularly in view of today’s new European data protection regulations.
All these giant computers and server farms also consume energy. To what extent do they endanger our environment?
This is the dark side of the cloud: they consume vast amounts of resources. Looking back, a few years ago, the data processing centres of the world already consumed around three per cent of the world’s total energy supply, and this value is rising drastically. There is also a massive effect on greenhouse gas emissions. As cloud computing expands, be it in the shape of large clouds or in the form of many small individual clouds, we urgently have to consider the environmental issues as well. That is why our work on energy efficiency is so vital, both in terms of computer design and in relation to the hardware technology.
Can the concept of edge computing offer assistance here?
Yes, the idea is to take the large red hotspot that greedily devours energy and to cool it down using numerous small green units distributed across the globe. This follows precisely the same line as the notion to harness the many terminal devices, which are used throughout the world and are now very powerful, by using them more intensively for data processing. This concept of peer-to-peer computing first emerged in the 1990s. At the time, it was mainly used for file sharing, for instance to upload and download music or movies, often drifting into illegal practices. Today, we can take this one step further by considering, in particular, how to harness the computing power of these many devices.
Does anyone ever think about “tidying up” within the data bulge, performing a thorough clean-up and discarding data?
Storage space is inexpensive nowadays, especially if the speed of reading and writing is of no great concern. No-one here is thinking about a clean-up. Once the data are either uploaded to the Internet or when they are leaked online inadvertently, it is practically impossible to delete them.
Considering the susceptibility to errors, what is better: one big cloud or countless small clouds?
In the case of individual computers we have a “single point of failure”, as we say in computer science. Here, it is possible to implement very strict security measures and to ensure that the core is well guarded. Where we have many decentralised units, the damage that affects one of the entities in the case of a security issue is naturally much less severe. At the same time, however, it is far less simple to implement requisite safety measures. Numerous small devices also lead to the problem of cheating, which means that we distribute the status from one point to several members. If everyone functioned properly, this would not present a problem. But that is not how the world works.
You are 44 years old. What problem would you like to see solved in your field of work by the time you retire? Or, to put it differently: If you were to achieve fame as a scientist, what would you like to be famous for?
The programming languages are still very primitive, especially in the case of high-performance computers. The way we programme today has hardly changed since the 1970s, and we have not yet been able to develop a higher-level programming language.
Why not?
It’s a translation problem. The translation from a higher-level programming language to an application involves so many layers and steps which need to be overcome that the problem seems insurmountable for now. We are also still dealing with high levels of inefficiency.
Will it ever be possible to use a natural language to programme computers?
There are many students who express that wish during the early semesters of their studies. (laughs) The holy grail of programming is to be able to use the natural – German, English, or Romanian – language to do more than merely issue orders. That much has been achieved already: “Call Thomas!” But programming also means developing new, innovative programmes, and from today’s perspective, it is not yet conceivable how we could manage this with natural language. What might be feasible is a simpler language that is also accessible for a greater number of people.
There are some voices that say that programming – along with writing and arithmetic – will soon be one of the fundamental skills of human beings. Do you agree?
It depends what is meant by “programming”. If the interface is easily accessible, a person can programme without realizing it. However, I do not believe that everyone is capable of algorithmic thinking, and neither do I believe that everyone has to be capable. After all, not everyone needs to be able to paint at the level of an accomplished artist. Thinking back to my university days, there was a definite moment when things simply clicked into place, and I understood what it means to think, structure and develop in this way. From that point on, things were much easier.
for ad astra: Romy Müller
About the person
Radu Prodan joined Alpen-Adria-Universität Klagenfurt in March 2018 as Professor for Distributed Systems at the Department of Information Technology. Born in Romania, he completed his engineering degree at the Technical University of Cluj-Napoca. Having gained his doctoral degree at Vienna Technical University (TU Wien), he was granted the venia docendi for Informatics by the University of Innsbruck in 2009. He has worked at ETH Zurich, the University of Basel, and the Swiss Scientific Computing Centre. From 2004, until his appointment as professor at AAU, he lectured at the Department of Informatics at the University of Innsbruck, and participated as lead scientist in several FWF, FFG, and EU projects. The EUH2020-FET project “ASPIDE”, also run by Radu Prodan, was recently approved, with the aim to improve exascale systems. His key research areas are: parallel and distributed systems, cloud computing, high-performance scientific computing, performance analysis and tools, scheduling and optimization, compiler technology, and energy efficiency.
Behind the scenes
The photo shoot for the ad astra cover story took place in the Data Center of the University of Klagenfurt. The Data Center is managed by Central Computing Services (ZID). It is run by the head of the Department of Server and Communication Systems, Gerald Hochegger (pictured here). The Data Center hosts the complete range of the university’s IT services (such as Moodle, the online staff portal, or the university’s website) along with the bespoke servers of numerous departments.