



# Experiences and Results of Parallelisation of Industrial Hard Real-time Applications for the parMERASA Multi-core

Prof. Dr. Theo Ungerer, parMERASA Project Coordinator, University of Augsburg



#### parMERASA

# Joint paper on parMERASA Project Results

Theo Ungerer<sup>1</sup>, Christian Bradatsch<sup>1</sup>, Martin Frieb<sup>1</sup>, Florian Kluge<sup>1</sup>, Jörg Mische<sup>1</sup>, Alexander Stegmeier<sup>1</sup>, Ralf Jahr<sup>1</sup>, Mike Gerdes<sup>1\*</sup>, Pavel Zaykov<sup>2</sup>, Lucie Matusova<sup>2</sup>, Zai Jian Jia Li<sup>2</sup>, Zlatko Petrov<sup>3</sup>, Bert Böddeker<sup>4</sup>, Sebastian Kehr<sup>4</sup>, Hans Regler<sup>5</sup>, Andreas Hugl<sup>5</sup>,
Christine Rochange<sup>6</sup>, Haluk Ozaktas<sup>6</sup>, Hugues Cassé<sup>6</sup>, Armelle Bonenfant<sup>6</sup>, Pascal Sainrat<sup>6</sup>, Nick Lay<sup>7</sup>, David George<sup>7</sup>, Ian Broster<sup>7</sup>, Eduardo Quiñones<sup>8</sup>, Milos Panic<sup>8,9</sup>, Jaume Abella<sup>8</sup>, Carles Hernandez<sup>8</sup>, Francisco Cazorla<sup>8,10</sup>, Sascha Uhrig<sup>11</sup>, Mathias Rohde<sup>11</sup>, and Arthur Pyka<sup>11</sup>

<sup>1</sup> University of Augsburg, Augsburg, Germany
 <sup>2</sup> Honeywell International s.r.o., Brno, Czech Republic
 <sup>3</sup> Honeywell EOOD, Sofia, Bulgaria
 <sup>4</sup> DENSO AUTOMOTIVE Deutschland GmbH, Eching, Germany
 <sup>5</sup> BAUER Maschinen GmBH, Schrobenhausen, Germany
 <sup>6</sup> Université Paul Sabatier, Toulouse, France
 <sup>7</sup> Rapita Systems Ltd., York, UK
 <sup>8</sup> Barcelona Supercomputing Center, Barcelona, Spain
 <sup>9</sup> Technical University of Catalunya, Barcelona, Spain
 <sup>10</sup> Spanish National Research Council, Barcelona, Spain
 <sup>11</sup> Technical University of Dortmund, Dortmund, Germany





# parMERASA

Multi-Core Execution of *parallelised* Hard Real-Time Applications Supporting Analysability

#### EC FP-7 project Sept. 1, 2011 – Sept. 30, 2014 3.3 Mio EC contribution Project webpage: http://www.parmerasa.eu

#### parMERASA Project Partners and IAB













#### Honeywell



#### Industrial Advisory Board:

Airbus, Toulouse, France Infineon Technologies UK Ltd, Bristol, UK Infineon Technologies AG, Munich, Germany BMW Group, Munich, Germany DELPHI, Sweden Elektrobit Automotive GmbH, Erlangen, Germany Daimler AG, Germany

### parMERASA Overview



- Motivation and Principal Objective
- Principal Developments and Results
- Parallelisation Results
- Conclusions

#### parMERASA Hard Real-time Systems



#### Hard real-time:

a deadline must never be missed if missed it may cause harm to humans or equipment

#### Mixed criticality in multi-cores:

combining functionalities with different levels of criticality within multi-core systems

e.g. sub-systems to be combined that have different automotive safety integrity levels (ASIL)



parMERASA goes one step beyond mixed criticality demands:

We target future complex control algorithms by parallelising hard real-time programs to run on predictable multi-core processors.

#### parMERASA Project Layout



parMERASA

Universität

Augsburg

## parMERASA Overview



- Motivation and Principal Objective
- Principal Developments and Results
- Parallelisation Results
- Conclusions

#### parMERASA Principal Results (1)

 Pattern-based approach to efficiently parallelise industrial applications for embedded real-time systems.



Universität

Augsburg University

#### parMERASA Principal Results (1)

Pattern-based approach: Activity and Pattern Diagram (ADP).



Universität

Augsburg University

#### parMERASA Principal Results (2): Tools



- WCET analysis and verification tools for multi-cores: static WCET tool OTAWA (University of Toulouse) and measurement-based WCET tool RapiTime all extended for parallel programs
- Further tools developed/extended for parallel program analysis by Rapita Systems Ltd: RapiTask trace viewer RapiCheck constraint checker RapiCover for code coverage RapiTime dependency analysis tool

#### parMERASA Principal Results (3)

Universität Augsburg University

Parallelisation of four industrial hard real-time applications: Stereo navigation (Honeywell International s.r.o.) 3D path planning (Honeywell International s.r.o.) Diesel engine management system (DENSO Automotive Deutschland GmbH) Dynamic compaction machine (BAUER Maschinen GmbH)



3D path planning: Laplacian multi-grid algorithm parallelised by obstacle map partitioning

#### parMERASA Principal Results (4)

 Hard real-time support in system software Common Kernel Lib Tiny automotive, Tiny avionics, BIOS for crawler crane RTEs



Universität

#### parMERASA Principal Results (5)

 HW platform: cluster-based multi-core architecture (by BSC) Parallel Software Partitions (pSWPs) and Guaranteed Resource Partitions (GRPs) predictable new cache: ODC<sup>2</sup> (by Tech. Uni. Dortmund) all integrated into a single multi-core simulator



 Contributions to AUTOSAR and ARINC Standards and to Open Source Software. Universität

## parMERASA Overview



- Motivation and Principal Objective
- Principal Developments and Results
- Parallelisation Results
- Conclusions

## parMERASA Results of Parallelisations (1)

**Observed speed-up** =  $\frac{\text{execution time of the sequential program}}{\text{execution time of the parallelised version}}$ 

Execution times measured by parMERASA simulator

**WCET speed-up** =  $\frac{\text{WCET estimate of the sequential program}}{\text{WCET estimate of the parallelised version}}$ 

#### Two types of WCET speed-ups:

based on static WCET bounds reached by OTAWA based on dynamic WCET estimates based on RapiTime

# parMERASA Results of Parallelisations (2)



parMERASA

## parMERASA Results of Parallelisations (3)





Observed execution time with ODC<sup>2</sup> (2.51 million cycles) is very close to a cache with perfect coherency protocol (2.42 million cycles for 8 cores)

parMERASA

# parMERASA Results of Parallelisations (4)

#### Compaction machine application



## **DarMERASA** Results of Parallelisations (5)

- Diesel engine management system
   1200 runnables
  - 11 time-driven tasks and
  - 1 crank-angle task (interrupt from the camshaft sensor)
  - (1) Inter-task level: parallel execution of tasks
  - (2) Intra-task level: parallel execution of runnables of the same task
  - (3) Intra-runnable: parallel execution of instruction blocks or function calls of a runnable.

## parMERASA Results of Parallelisations (6)

 Diesel engine management system: static WCET speed-up of inter-task parallelization



parMERASA

## **DarMERASA** Results of Parallelisations (6)

#### Diesel engine management system: static WCET speed-up of intra-task parallelization

| Task           | Sequential | Parallel | Speed-up |
|----------------|------------|----------|----------|
| $\tau_1$       | 104260     | 95845    | 1.09     |
| $	au_4$        | 371453     | 210032   | 1.77     |
| $	au_{ m 5}$   | 12426      | 12426    | 1.00     |
| $	au_8$        | 249165     | 74842    | 3.33     |
| $	au_{16}$     | 840580     | 422562   | 1.99     |
| $\tau_{20}$    | 65412      | 32749    | 2.00     |
| $	au_{32}$     | 612863     | 322600   | 1.90     |
| $	au_{64}$     | 132771     | 84462    | 1.57     |
| $	au_{96}$     | 102593     | 82300    | 1.25     |
| $\tau_{128}$   | 391206     | 342665   | 1.14     |
| $\tau_{1024}$  | 469303     | 379605   | 1.24     |
| $\tau_{crBas}$ | 833225     | 437248   | 1.91     |

## **DarMERASA** Results of Parallelisations (7)

#### **Diesel engine management system:**

Intra- and inter-task parallelism were combined.

Longest running task was distributed over 2 cores.

Static WCET speedup estimate increased to **5.97 on 8 cores**.



per core task allocation with combined parallelization approach

parMERASA

## parMERASA Results of Parallelisations (8)

 Diesel engine management system: measurement-based WCET speed-up of intra-runnable parallel.





- EC FP-7 parMERASA project (Oct. 1, 2011 Sept. 30, 2014) targeted future complex control algorithms by parallelizing hard real-time programs.
- Reasonable WCET speed-ups can be reached with a low number of cores, e.g. 5.97 on 8 cores for diesel EMS.
- Scalability of real-world hard real-time applications that run successful on single-core is limited.
- Static WCET speed-ups are limited for high core numbers due to pessimism caused by potentially conflicting global memory accesses in a multi-core.
- parMERASA project paved the way for future high-performance embedded systems applications.
   More complex control algorithms than today can be applied.
   Such algorithms should be designed scalable.



Thanks to the collaborators in parMERASA project:

CHRISTIAN BRADATSCH, MARTIN FRIEB, FLORIAN KLUGE, JÖRG MISCHE, ALEXANDER STEGMEIER, RALF JAHR, MIKE GERDES, University of Augsburg PAVEL ZAYKOV, LUCIE MATUSOVA, ZAI JIAN JIA LI, Honeywell International s.r.o. ZLATKO PETROV, Honeywell EOOD BERT BÖDDEKER, SEBASTIAN KEHR, DENSO AUTOMOTIVE Deutschland HANS REGLER and ANDREAS HUGL, BAUER Maschinen GmbH CHRISTINE ROCHANGE, HALUK OZAKTAS, HUGUES CASSE', ARMELLE BONENFANT, and PASCAL SAINRAT, Université Paul Sabatier, Toulouse NICK LAY, DAVID GEORGE, and IAN BROSTER, Rapita Systems Ltd. EDUARDO QUINONES, MILOS PANIC, JAUME ABELLA, CARLES HERNANDEZ, FRANCISCO CAZORLA, Barcelona Supercomputing Center SASCHA UHRIG, MATHIAS ROHDE, ARTHUR PYKA, **Technical University of Dortmund**