Hazem A. Abdelhafez

Hazem A. Abdelhafez

Senior GPU Compiler Engineer

Qualcomm

Biography

I am a senior GPU compiler performance engineer at Qualcomm. My responsibilities include performance characterization of GPU workloads (compute and graphics) with the goal of identifying bottlenecks and optimization opportunties to incorporate in the compiler design.

I received my PhD degree from the Electrical and Computer Engineering department at the University of British Columbia (UBC). My research topic focuses on promoting energy-efficiency in heterogeneous computing devices through better utilization of microarchitectural features such as fine grain frequency tuning and unified memory.

I received my BSc and MSc degrees in Electronics and Electrical Communications Engineering from Cairo University in 2012 and 2016 respectively. Before starting my PhD program in 2016 at UBC, I worked as a Research & Development engineer at Intel labs for two years (2012-2014), then as a Software Engineer at AvidBeam for another two years (2014-2016).

I led, and contributed to, several projects over the past ten years, spanning cognitive wireless communications systems, large-scale video analytics, compilers, and modeling performance and power consumption of different computing systems, including Supercomputers and Edge devices. I also had multiple successful collaborations with several research organizations and industry partners, and I had the honor of receiving several awards (scholarships, and grants) at UBC.

Download my resume.

Interests
  • Compilers
  • GPU Computing
  • Heterogeneous Computing
  • Performance Engineering
  • Machine Learning
Education
  • PhD in Electrical and Computer Engineering, 2023

    University of British Columbia

  • MSc in Electronics and Electrical Communications, 2016

    Cairo University

  • BSc in Electronics and Electrical Communications, 2007

    Cairo University

Industry Experience

 
 
 
 
 
Senior GPU Compiler Engineer
Jun 2023 – Present San Jose, United States
  • Characterize GPU computer workloads performance behavior to identify bottlenecks and optimization opportunities.
  • Prototype compiler optimizations that improve the performance of the target GPU workloads.
 
 
 
 
 
Research and Development Intern
Jan 2020 – Jun 2020 Toronto, Canada
  • Developed and implemented a proof-of-concept for efficient compiler-based data-dependency management algorithm for in-order instruction issue processors that led to up to 15% and average of 5% improvement in end-to-end latency across several benchmarks.
  • Created and submitted an accepted patent application (currently in the filling process) for the algorithm.
  • Evaluated the impact of re-ordering operands of associative instructions as a compiler optimization phase across several benchmarks, which allowed the management to make a well-informed decision before allocating resources and time for full scale implementation.
  • Implemented and contributed several components from SPIR-V specification in MLIR project (part of the open-source LLVM project) which allowed me to achieve the contributor status in the LLVM project.
 
 
 
 
 
Astro Program Participant (Research Intern)
May 2018 – Aug 2018 OakRidge, USA
  • Built an open-source profiling-based tool that allows developers and system designers to analyze the performance of GPU-accelerated applications in fields such as high-performance computing.
  • Created and developed an analytical model for projecting intra-node connectivity impact on data transfer time in GPU-accelerated systems using data sheet information. The model projects data transfer times on next-generation nodes without actual deployment with error ranging from 19% to 23%.
 
 
 
 
 
Software Engineer
Sep 2014 – Sep 2016 Cairo, Egypt
  • Took a leadership role in the development of a large-scale video analytics platform (ATUN). This platform currently serves as a base for scalable computer vision algorithms shipped by Avidbeam.
  • Contributed in joint research and development effort between Avidbeam and Intel corporation to create a real-time large-scale video analytics platform, and participated in Network Function Virtualization project proposal which led to a collaborative proof-of-concept between international telecommunication service provider and Avidbeam.
 
 
 
 
 
Research and Development Engineer
Jul 2012 – Apr 2014 Cairo, Egypt
  • Developed several demos for Licensed Shared Access systems that were demonstrated at various Intel technology events such as Intel Developer Forum (IDF) and Research at Intel (R@I).
  • Created algorithms for dynamically allocating radio spectrum using predictive machine learning algorithms which led to four publications and three granted patents.

Academic Experience

 
 
 
 
 
Research Assistant
Sep 2016 – May 2023 Vancouver, Canada
  • Characterize and model deep-learning workloads on heterogeneous computing platforms to identify performance bottlenecks and optimize power-consumption to promote energy-efficient computing.
  • Develop a statistical analysis-based methodology to study the inter and intra-node performance and power consumption variability amongst identical Edge computing devices.
  • Study and instrument PyTorch framework Virtual Machine stack interpreter, part of the TorchScript module, to insert performance and kernel-level information gathering code that allows us to study deployed deep learning networks’ runtime behavior and hot kernels on CPU/GPU-based systems.
 
 
 
 
 
Teaching Assistant
Jan 2017 – Apr 2023 Vancouver, Canada
  • Worked as a teacher assistant for four spring semesters for the Design of Distributed Software Applications undergraduate course (CPEN431).
  • Provided guidance and assistance to students during lab hours and online to help them build an end-to-end distributed key-value store application.
  • Built an automated grading software to reduce marking effort and time; And developed a software-based management tool for PlanetLab distributed computing resources. Managed our AWS cloud infrastructure that serves the course’s UI web-based services and testing software.

Publications

Quickly discover relevant content by filtering publications.
(2023). Hot Under the Hood: An Analysis of Ambient Temperature Impact on Heterogeneous Edge Platforms. In EdgeSys ‘23.

PDF Cite DOI

(2022). Characterizing Variability in Heterogeneous Edge Systems: A Methodology & Case Study. In SEC ‘22.

PDF Cite Slides

(2022). Devices, Methods, and Media for Efficient Data Dependency Management for In-order Issue Processors. In USPO and WIPO.

PDF Cite

(2022). MIRAGE: Machine Learning-based Modeling of Identical Replicas of the Jetson AGX Embedded Platform. In SEC ‘21.

PDF Cite DOI

(2021). Snowflakes at the Edge: A Study of Variability among NVIDIA Jetson AGX Xavier Boards. In EdgeSys ‘19.

PDF Cite DOI

(2019). AHEAD: A Tool for Projecting Next-Generation Hardware Enhancements on GPU-Accelerated Systems. In IPDPSW ‘19.

PDF Cite Code DOI

(2017). NVIDIA Jetson Platform Characterization. In EuroPar ‘17.

PDF Cite Code

(2014). Cloud-Assisted Spectrum Management System with Trading Engine. In IWCMC ‘14.

PDF Cite DOI