Many-Core Platforms in the Real-Time Embedded Computing Domain
Ref: CISTER-TR-150603       Publication Date: 24, Apr, 2015

Many-Core Platforms in the Real-Time Embedded Computing Domain

Ref: CISTER-TR-150603       Publication Date: 24, Apr, 2015

Over the past few decades, the technological advancements made our lives increasingly permeated by and dependent on embedded systems. At the present day, these devices account for more than 98% of all produced computing systems, with applications that span over a wide range of areas, from medicine to avionics. Some embedded systems interact with the physical environment and have to guarantee not only that a certain action will be performed correctly, but also that the action will complete within a certain time. These devices are called real-time embedded systems, and some notable examples are medical pacemakers, airbags in cars and autopilots in airplanes.
The process of analysing the temporal behaviour of a real-time embedded system is called real-time analysis. In many cases, the purpose of the analysis is to derive guarantees that a device will perform its functions correctly, while at the same time meeting all timing requirements. A real-time analysis is mostly performed at design-time, thus its efficiency highly depends on the amount of predictability of the entire system, whereas any non-deterministic aspect of the system behaviour has to be accounted for in the analysis with a certain degree of pessimism. A pessimistic analysis may cause a significant resource over-provisioning in the design phase, and consequently lead to a severe underutilisation of available resources at runtime. Therefore, reducing the analysis pessimism is one of the ever-present objectives in the real-time embedded computing domain.
The first real-time embedded systems were predominantly single-core devices with limited sets of functionalities. However, constantly increasing demands for more advanced and sophisticated functionalities required more powerful computational devices. When faced with the same challenge, the other computing areas (e.g. general-purpose or high-performance computing) opted for platforms consisting of several cores – multi-cores and more than a dozen of cores – manycores. It comes as no surprise that the same trends, although with an offset, are noticeable in the evolution of the real-time embedded systems, where many-core platforms present the new frontier technology.
Besides giving the options to implement more advanced functionalities, many-core platforms offer other beneficial possibilities as well. For instance, multiple functionalities, that were previously implemented on a set of single-core devices, can be integrated within fewer many-core platforms with significant design-cost reductions. Moreover, the abundance of available cores allows to implement efficient thermal and power management strategies by deliberately performing temporary shutdowns of idle cores. At the same time, the existence of idle cores, which can be used if necessary, makes these devices more resilient to hardware failures. Yet, despite the aforementioned benefits, the integration of many-cores into the real-time embedded domain is a big challenge. The most notable reasons are (i) increasingly complex designs of hardware components, promoting performance, often at the expense of predictability, and (ii) more significant and hard-to-analyse contention patterns for accesses to shared resources. These facts may contribute to a non-deterministic system behaviour, while, as explained above, every non-deterministic aspect of the system behaviour has to be accounted for in the real-time analysis with a certain degree of pessimism.
In this dissertation, the focus is on the analysis of real-time embedded systems deployed on many-core platforms. Specifically, a comprehensive collection of techniques and design choices is presented, with the common objective to make many-cores more amenable to the real-time analysis, and consequently more suitable and applicable to the real-time embedded domain. The proposed methods achieve this end in several ways: (i) by extending the state-of-the-art approaches in order to reduce the analysis pessimism, (ii) by exploiting novel hardware features, as well as enforcing constraints which cause a more deterministic and analysable system behaviour, and (iii) by elaborating on promising OS and workload paradigms, which have not been previously considered in the real-time embedded computing domain.
The contributions of this dissertation can be classified into two groups. In the first set of contributions the focus is on the interconnect medium, which is one of the most complex-toanalyse resources in many-core platforms. Initially, the target interconnect is the network-on-chip with a 2-D mesh topology, which utilises the wormhole switching mechanism and the XY routing technique. For such a generic model, which is present in the most of contemporary many-cores, a novel worst-case communication delay analysis is proposed, and subsequently compared with the state-of-the-art method. Then, assuming the additional hardware support in the form of virtual channels, improvements over the state-of-the-art approaches are proposed, which, not only reduce the analysis pessimism, but also significantly reduce the requirements for hardware resources. Finally, a novel arbitration policy for NoC routers is proposed.
In the second set of contributions the focus is on a novel paradigm in the real-time embedded domain, called the Limited Migrative Model. This model is inspired by the latest trends in the high-performance and general-purpose computing. First, the model is introduced and the cost of maintaining it is analytically estimated, both in terms of computational and interconnect resources, where, for the later aspect, the findings from the first set of contributions are used (see the previous paragraph). Then, three aspects of the application workload are studied, namely: (i) communication requirements, (ii) memory requirements, and (iii) computation requirements. The first aspect is addressed by imposing several constraints, which make the communication patterns more predictable, and subsequently allow to derive a communication delay analysis. Moreover, the workload assignment to computational resources is investigated, but only from the communication perspective, with the objective to spatially distribute the workload in such a way that all timing constraints posed on communication delays are met. Then, the focus is shifted towards the memory requirements, and a set of analysis techniques are proposed, which can be used to check whether the memory traffic requirements are also fulfilled. In the final part, the computation requirements of the application workload are studied. However, for this aspect only a coarse-grained analysis with several simplifying assumptions is presented. The proposed method represents an initial step towards the complete analysis related to the computation requirements. Subsequently, assuming this initial analysis, the problem of the workload assignment to computational resources is revisited, but this time with an orthogonal objective, which is to assure that the computational requirements of the workload are fulfilled.
The findings suggest that the first set of contributions significantly improves over the state-ofthe-art methods in the real-time analysis of interconnects. The improvements are manifested with the reduced analysis pessimism, as well as reduced hardware requirements. Both these aspects are essential for mitigating the resource over-provisioning effects when designing a new system. Additionally, the findings suggest that the Limited Migrative Model has a lot of potential, and represents a promising step towards the application of many-core platforms into the real-time embedded computing domain

Borislav Nikolic

PhD Thesis, Faculdade de Engenharia, Universidade do Porto.

Record Date: 12, Jun, 2015