Hazem A. Abdelhafez

Senior GPU Compiler Engineer

Qualcomm

Biography

I am a senior GPU compiler performance engineer at Qualcomm. My responsibilities include performance characterization of GPU workloads (compute and graphics) with the goal of identifying bottlenecks and optimization opportunties to incorporate in the compiler design.

I received my PhD degree from the Electrical and Computer Engineering department at the University of British Columbia (UBC). My research topic focuses on promoting energy-efficiency in heterogeneous computing devices through better utilization of microarchitectural features such as fine grain frequency tuning and unified memory.

I received my BSc and MSc degrees in Electronics and Electrical Communications Engineering from Cairo University in 2012 and 2016 respectively. Before starting my PhD program in 2016 at UBC, I worked as a Research & Development engineer at Intel labs for two years (2012-2014), then as a Software Engineer at AvidBeam for another two years (2014-2016).

I led, and contributed to, several projects over the past ten years, spanning cognitive wireless communications systems, large-scale video analytics, compilers, and modeling performance and power consumption of different computing systems, including Supercomputers and Edge devices. I also had multiple successful collaborations with several research organizations and industry partners, and I had the honor of receiving several awards (scholarships, and grants) at UBC.

Download my resume.

Interests

Compilers
GPU Computing
Heterogeneous Computing
Performance Engineering
Machine Learning

Education

PhD in Electrical and Computer Engineering, 2023

University of British Columbia
MSc in Electronics and Electrical Communications, 2016

Cairo University
BSc in Electronics and Electrical Communications, 2007

Cairo University

Industry Experience

Senior GPU Compiler Engineer

Qualcomm

Jun 2023 – Present San Jose, United States

Characterize GPU computer workloads performance behavior to identify bottlenecks and optimization opportunities.
Prototype compiler optimizations that improve the performance of the target GPU workloads.

Research and Development Intern

Huawei, Hetereogeneous Compilers Lab

Jan 2020 – Jun 2020 Toronto, Canada

Developed and implemented a proof-of-concept for efficient compiler-based data-dependency management algorithm for in-order instruction issue processors that led to up to 15% and average of 5% improvement in end-to-end latency across several benchmarks.
Created and submitted an accepted patent application (currently in the filling process) for the algorithm.
Evaluated the impact of re-ordering operands of associative instructions as a compiler optimization phase across several benchmarks, which allowed the management to make a well-informed decision before allocating resources and time for full scale implementation.
Implemented and contributed several components from SPIR-V specification in MLIR project (part of the open-source LLVM project) which allowed me to achieve the contributor status in the LLVM project.

Astro Program Participant (Research Intern)

OakRidge National Lab, Leadership Computing Facility

May 2018 – Aug 2018 OakRidge, USA

Built an open-source profiling-based tool that allows developers and system designers to analyze the performance of GPU-accelerated applications in fields such as high-performance computing.
Created and developed an analytical model for projecting intra-node connectivity impact on data transfer time in GPU-accelerated systems using data sheet information. The model projects data transfer times on next-generation nodes without actual deployment with error ranging from 19% to 23%.

Software Engineer

Avidbeam Technologies

Sep 2014 – Sep 2016 Cairo, Egypt

Took a leadership role in the development of a large-scale video analytics platform (ATUN). This platform currently serves as a base for scalable computer vision algorithms shipped by Avidbeam.
Contributed in joint research and development effort between Avidbeam and Intel corporation to create a real-time large-scale video analytics platform, and participated in Network Function Virtualization project proposal which led to a collaborative proof-of-concept between international telecommunication service provider and Avidbeam.

Research and Development Engineer

Intel, Wireless Communications Lab

Jul 2012 – Apr 2014 Cairo, Egypt

Developed several demos for Licensed Shared Access systems that were demonstrated at various Intel technology events such as Intel Developer Forum (IDF) and Research at Intel (R@I).
Created algorithms for dynamically allocating radio spectrum using predictive machine learning algorithms which led to four publications and three granted patents.

Academic Experience

Research Assistant

University of British Columbia, Electrical and Computer Engineering

Sep 2016 – May 2023 Vancouver, Canada

Characterize and model deep-learning workloads on heterogeneous computing platforms to identify performance bottlenecks and optimize power-consumption to promote energy-efficient computing.
Develop a statistical analysis-based methodology to study the inter and intra-node performance and power consumption variability amongst identical Edge computing devices.
Study and instrument PyTorch framework Virtual Machine stack interpreter, part of the TorchScript module, to insert performance and kernel-level information gathering code that allows us to study deployed deep learning networks’ runtime behavior and hot kernels on CPU/GPU-based systems.

Teaching Assistant

University of British Columbia, Electrical and Computer Engineering

Jan 2017 – Apr 2023 Vancouver, Canada

Worked as a teacher assistant for four spring semesters for the Design of Distributed Software Applications undergraduate course (CPEN431).
Provided guidance and assistance to students during lab hours and online to help them build an end-to-end distributed key-value store application.
Built an automated grading software to reduce marking effort and time; And developed a software-based management tool for PlanetLab distributed computing resources. Managed our AWS cloud infrastructure that serves the course’s UI web-based services and testing software.

Publications

Quickly discover relevant content by filtering publications.

Amirhossein Ahmadi, Hazem A. Abdelhafez, Shashwat Jaiswal, Karthik Pattabiraman, Matei Ripeanu (2023). Hot Under the Hood: An Analysis of Ambient Temperature Impact on Heterogeneous Edge Platforms. In EdgeSys ‘23.

PDF Cite DOI

Hazem A. Abdelhafez, Amr Almoallim, Hassan Halawa, Amirhossein Ahmadi, Karthik Pattabiraman, Matei Ripeanu (2022). Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study. In SEC ‘22.

PDF Cite Slides

Hazem A. Abdelhafez, Ning Xie, Ahmed Mohammed ElShafiey Mohammed Eltantawy (2022). Devices, Methods, and Media for Efficient Data Dependency Management for In-order Issue Processors. In USPO and WIPO.

PDF Cite

Hazem A. Abdelhafez, Hassan Halawa, Mohamed Osama Ahmed, Karthik Pattabiraman, Matei Ripeanu (2022). MIRAGE: Machine Learning-based Modeling of Identical Replicas of the Jetson AGX Embedded Platform. In SEC ‘21.

PDF Cite DOI

Hazem A. Abdelhafez, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu (2021). Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In EdgeSys ‘19.

PDF Cite DOI

Hazem A. Abdelhafez, Matei Ripeanu (2019). Studying the Impact of CPU and Memory Controller Frequencies on Power Consumption of the Jetson TX1. In FMEC ‘19.

PDF Cite Code DOI

Hazem A. Abdelhafez, Christopher Zimmer, Sudharshan S. Vazhkudai, Matei Ripeanu (2019). AHEAD: A Tool for Projecting Next-Generation Hardware Enhancements on GPU-Accelerated Systems. In IPDPSW ‘19.

PDF Cite Code DOI