Architecture of Computing Systems

Porto, Portugal
24-27 March 2015


Tutorial: CUDA tuning and new GPU trends


Date and Time: 9AM-12:30AM, March 25, 2015





Manuel Ujaldon

CUDA Fellow @ Nvidia, Prof. @ Computer Architecture Department, University of Malaga, Spain


Sponsored by:


After a decade being used as hardware accelerators, GPUs constitute nowadays a solid alternative for high performance computing at an affordable cost. Increasing volumes of data managed by large-scale applications make GPUs very attractive for scientific computing, deploying SIMD parallelism in an unprecedented way.

This tutorial will review current achievements of many-core GPUs, recent and future hardware enhancements, and emerging challenges to leverage GPUs as accelerators within general-purpose exascale computing. Examples and case studies will be given of software features (dynamic parallelism, unified memory), hardware issues (Hyper-Q, 3D-DRAM) and disruptive low-power devices (Tegra, Jetson and Denver from Nvidia).

Main objectives:

Show the latest software mechanisms and hardware enhancements in CUDA.

Target audience: 

CUDA C programmers already familiarized with the basic mechanisms of CUDA who want to learn about optimizations, advanced features and future trends in CUDA.

The tutorial will cover:

1.      The road to maximize GPU performance. How to get close to the peak performance of a given GPU model.

2.      A case study based on a simple irregular kernel devoted to sparse matrices: Sequence of optimizations, productivity, scalability, trade-offs, performance results and analysis.

3.      Advanced CUDA features: Dynamic parallelism, Hyper-Q (Kepler), unified memory (Maxwell).

4.      Future GPU hardware enhancements: Denver, NVLINK, 3D-DRAM (Pascal).

Short Speakerís Bio:

Manuel Ujaldon is an Associate Professor (credited as Full Professor by ANECA) at the Computer Architecture Department, University of Malaga (Spain) and Conjoint Senior Lecturer at the School of Electrical Engineering and Computer Science of the University of Newcastle (Australia).

He worked in the 90's on parallelizing compilers, finishing his PhD Thesis in 1996 by developing a data-parallel compiler for sparse matrix and irregular applications. Over this period, he was part of the HPF and MPI Forums, working as post-doc in the Computer Science Department of the University of Maryland, College Park.

Last decade he started working on the GPGPU movement early in 2003 using Cg, and wrote the first book in spanish about programming GPUs for general purpose computing, where he described how to map irregular applications and linear algebra algorithms on GPUs. He adopted CUDA when it was first released, then focusing on image processing and biomedical applications. Over the past five years, he has authored more than 50 papers in journals and international conferences in these two areas.

Dr. Ujaldon has been awarded an NVIDIA Academic Partnership 2008-2011, NVIDIA Teaching Center since 2011, NVIDIA Research Center since 2012, and finally CUDA Fellow. Over the past three years, he has taught more than 60 courses on CUDA programming worldwide sponsored by NVIDIA, including tutorials in ACM/IEEE conferences and academic programs in European and North American Universities. Lately, he has fostered activities in the south hemisphere, covering countries like Argentina, Australia, Brazil, Chile, New Zealand, Peru, South Africa and Uruguay