Conflicts in Cache Behavior Prediction

Introduction

Memory transfers are the performance bottleneck of many applications due to poor data locality and limited memory bandwidth. Code refactoring for better data locality can improve cache behavior, leading to significant performance boosts.

Benefits

The ability to systematically identify and resolve performance bottlenecks is a highly sought-after skill.
Performance matters! During this thesis you can:

Gain valuable hands-on experience in high-performance computing and computer architecture
Have the opportunity to work with state-of-the-art hardware architectures
Deepen your knowledge in the C/C++ programming languages
Get in touch with the Future Computing Group at the Leibniz Supercomputing Centre (LRZ)

Motivation and Background

Modeling cache behavior can gain valuable insights to possible performance bottlenecks. Reuse distance, a measure of data locality, is useful in identification and optimization of hot code regions exhibiting poor data locality. It is defined as the number of unique memory locations referenced between a pair of references to the same memory location. On the granularity of cache lines, reuse distance can model spatial and temporal locality to assess cache behavior of applications.

Assuming a fully associative cache with least recently used (LRU) replacement policy, predicting cache behavior with reuse distance is exact. However, several cache-specific details are omitted by reuse distance:

Limited cache associativity (conflict misses)
Deviations from true LRU replacement policy in real hardware
Hardware prefetching

Tasks & Goals

In this thesis, you will explore the accuracy of cache behavior prediction with reuse distance in irregular applications. As an example application, you will use sequential sparse matrix-vector multiplications (SpMV), a ubiquitous kernel, for instance in physical simulations and graph algorithms.

In the context of this thesis, you will:

Adjust an existing C++ reuse distance algorithm to reflect cache associativity
Implement a simple pseudo-LRU (bit- / tree-based) cache simulator (optional)
Design and conduct experiments with sequential (optional: parallel) SpMV on several state-of-the-art hardware architectures of the BEAST system
Compare different cache-behavior prediction approaches with cache-related hardware performance events obtained from real SpMV execution:

1. Reuse distance
2. Associativity-aware reuse distance
3. pseudo-LRU (optional)
4. Cachegrind (a Valgrind tool, optional)

Prerequisites

Basic understanding and interest in computer architecture and multi-core memory hierarchies
Basic knowledge in high-performance and parallel computing (from the lecture Parallel and High-Performance Computing or equivalent)
Proficiency in Linux, and C/C++ (from the practical course Systempraktikum or equivalent)

Starting Literature

Beyls and. D’Hollander Reuse Distance as a Metric for Cache Behavior
Hennessy and Patterson Computer Architecture a Quantitative Approach (5th ed.)

Organisatorisches

Aufgabensteller: Prof. Dr. D. Kranzlmüller
Dauer der Arbeit: 3 Monate
Anzahl Bearbeiter: 1
Betreuer: Sergej Breiter
Sprache: Deutsch oder Englisch

Get in touch

In case you are interested, please send a short e-mail (German or English) with your motivation and which prerequisites you fulfill to Sergej Breiter

Please do not send long e-mails generated by a large language model!