Skip to content

senior software engineer — amd rocm

Amir Shetaia

I build the layers most software never sees  GPU drivers, deterministic solvers, and the systems underneath.

Developing GPU drivers for ML and data center workloads at AMD on the ROCm platform. Background in HPC optimization, formal methods, and verification systems.

amir@now:~$
open to interesting problems & collaborations
Résumé
local timeLIVE
--:--:--
· Eastern · UTC−5
based in
Toronto, Ontario
Canada
43.65° N
79.38° W

worked at · studied at

AMD logo
AMD
HUAWEI logo
HUAWEI
Critlab at Queen's University logo
Critlab at Queen's University
Queen's University logo
Queen's University
Packt logo
Packt
Valeo logo
Valeo
Tekomoro logo
Tekomoro
UCCD Mansoura Engineering logo
UCCD Mansoura Engineering
Siemens EDA logo
Siemens EDA
Mansoura Robotics Club logo
Mansoura Robotics Club
Mansoura University logo
Mansoura University
AMD logo
AMD
HUAWEI logo
HUAWEI
Critlab at Queen's University logo
Critlab at Queen's University
Queen's University logo
Queen's University
Packt logo
Packt
Valeo logo
Valeo
Tekomoro logo
Tekomoro
UCCD Mansoura Engineering logo
UCCD Mansoura Engineering
Siemens EDA logo
Siemens EDA
Mansoura Robotics Club logo
Mansoura Robotics Club
Mansoura University logo
Mansoura University
AMD logo
AMD
HUAWEI logo
HUAWEI
Critlab at Queen's University logo
Critlab at Queen's University
Queen's University logo
Queen's University
Packt logo
Packt
Valeo logo
Valeo
Tekomoro logo
Tekomoro
UCCD Mansoura Engineering logo
UCCD Mansoura Engineering
Siemens EDA logo
Siemens EDA
Mansoura Robotics Club logo
Mansoura Robotics Club
Mansoura University logo
Mansoura University
AMD logo
AMD
HUAWEI logo
HUAWEI
Critlab at Queen's University logo
Critlab at Queen's University
Queen's University logo
Queen's University
Packt logo
Packt
Valeo logo
Valeo
Tekomoro logo
Tekomoro
UCCD Mansoura Engineering logo
UCCD Mansoura Engineering
Siemens EDA logo
Siemens EDA
Mansoura Robotics Club logo
Mansoura Robotics Club
Mansoura University logo
Mansoura University
01

About

The short version of a longer story — from Mansoura to Toronto, always drawn to the layer just below the surface.

I work where correctness, performance, and hardware meet — and I like it most when the problem is hard enough that the answer has to be measured, not guessed.

Today I'm at AMD, building Linux GPU drivers for the ROCm platform — the layer that lets machine-learning and data-center workloads actually reach the hardware. That's kernel and driver work in C/C++: shipping features for current and next-gen GPUs and chasing down the complex issues customers and QA surface.

Before AMD I made large-scale optimization solvers deterministic and fast at Huawei (C++, OpenMP, HPC), researched formal methods and LLM-assisted verification at Queen's, and spent years in embedded — automotive, robotics, and board bring-up. The constant: low-level systems where correctness and performance both have to hold.

Amir Shetaia
LIVE43.65°N
Amir ShetaiaToronto, Canada
Global 1st · Huawei ICT
Cloud Track — Shenzhen, 2024
0.0
MASc GPA · /4.3
0+
people reached
0
roles across systems, HPC, ML & embedded
open to interesting problems & collaborations
02

Capabilities

The tools I reach for, grouped by where they live in the stack — from silicon up to models.

C/C++PythonC#JavaRustGoSQLJavaScriptAssemblyMATLABZephyrAUTOSAR OSEmbedded LinuxMCU DebuggingBuildrootMCTPPLDMSPDMFRUCANLINI2CSPISMBus
C/C++PythonC#JavaRustGoSQLJavaScriptAssemblyMATLABZephyrAUTOSAR OSEmbedded LinuxMCU DebuggingBuildrootMCTPPLDMSPDMFRUCANLINI2CSPISMBus
C/C++PythonC#JavaRustGoSQLJavaScriptAssemblyMATLABZephyrAUTOSAR OSEmbedded LinuxMCU DebuggingBuildrootMCTPPLDMSPDMFRUCANLINI2CSPISMBus
C/C++PythonC#JavaRustGoSQLJavaScriptAssemblyMATLABZephyrAUTOSAR OSEmbedded LinuxMCU DebuggingBuildrootMCTPPLDMSPDMFRUCANLINI2CSPISMBus
I3CSensor ManagementAWSDockerKubernetesTerraformCI/CDDistributed SystemsMultithreadingOpenMPHPCComputer ArchitectureGDBWinDbgWiresharkOscilloscope/Logic AnalyzerTensorFlowPyTorchHugging FaceScikit-LearnYOLONLPLLMs
I3CSensor ManagementAWSDockerKubernetesTerraformCI/CDDistributed SystemsMultithreadingOpenMPHPCComputer ArchitectureGDBWinDbgWiresharkOscilloscope/Logic AnalyzerTensorFlowPyTorchHugging FaceScikit-LearnYOLONLPLLMs
I3CSensor ManagementAWSDockerKubernetesTerraformCI/CDDistributed SystemsMultithreadingOpenMPHPCComputer ArchitectureGDBWinDbgWiresharkOscilloscope/Logic AnalyzerTensorFlowPyTorchHugging FaceScikit-LearnYOLONLPLLMs
I3CSensor ManagementAWSDockerKubernetesTerraformCI/CDDistributed SystemsMultithreadingOpenMPHPCComputer ArchitectureGDBWinDbgWiresharkOscilloscope/Logic AnalyzerTensorFlowPyTorchHugging FaceScikit-LearnYOLONLPLLMs

Languages

10
C/C++PythonC#JavaRustGoSQLJavaScriptAssemblyMATLAB

Firmware and Embedded

05
ZephyrAUTOSAR OSEmbedded LinuxMCU DebuggingBuildroot

Protocols and Standards

11
MCTPPLDMSPDMFRUCANLINI2CSPISMBusI3CSensor Management

Cloud and DevOps

05
AWSDockerKubernetesTerraformCI/CD

System Design

05
Distributed SystemsMultithreadingOpenMPHPCComputer Architecture

Debug and Tools

04
GDBWinDbgWiresharkOscilloscope/Logic Analyzer

AI and ML

07
TensorFlowPyTorchHugging FaceScikit-LearnYOLONLPLLMs
03

Selected Work

A few projects that show how I think — making a parallel solver reproducible, taming messy logs, and putting a car's senses on the edge.

P-01featured

OptVerse: Deterministic Sparse Linear Solver

R&D Software Engineer · Huawei

Made Huawei's OptVerse Cholesky solver bit-for-bit reproducible across runs in C++/OpenMP without giving up HPC throughput.

bit-exact
run-to-run reproducible
C++ / OpenMP
parallel HPC
Mittelmann
benchmark-validated
problem

Parallel sparse Cholesky factorization produced run-to-run variation — non-associative floating-point reductions and non-deterministic OpenMP scheduling gave slightly different results each run, blocking reproducible benchmarking and root-cause debugging.

approach

Traced the non-determinism to parallel reduction order and floating-point accumulation order; introduced deterministic reduction/scheduling strategies with controlled accumulation, then profiled the hot paths to recover the throughput the determinism constraints cost.

result

Delivered a deterministic Cholesky solver with reproducible output across runs, validated on Hans Mittelmann's optimization benchmarks to keep OptVerse competitive on large-scale problems.

C++OpenMPHPCSparse Linear AlgebraProfiling
P-02featured

DeepParse: LLM-Enhanced Log Parsing Framework

Project Lead

Hybrid log parsing system combining DeepSeek-R1:8B and Drain algorithm achieving 97.6% accuracy across 16 datasets.

0.0%
parse accuracy
0
datasets
DeepSeek-R1
+ Drain hybrid
problem

Log parsing is critical for debugging but traditional deterministic methods struggle with variable formats and new log types.

approach

Hybrid approach combining deterministic Drain parsing with LLM-driven template generation using DeepSeek-R1:8B for improved accuracy and adaptability.

result

Achieved 97.6% parsing accuracy on 16 diverse datasets, enabling more effective anomaly detection and faster root cause analysis.

LLMsDeepSeek-R1NLPPythonLog Parsing
P-03featured

VehiPlus: Embedded Telematics & Driver Assistance Platform

Developer

Raspberry Pi 4 based real-time diagnostics platform combining OBD-II telemetry, MQTT messaging, YOLO object detection, and ML for driver assistance.

<0ms
alert latency
RPi 4
edge inference
OTA
update framework
problem

Vehicle diagnostics and driver assistance systems require real-time processing with edge compute constraints and reliable connectivity.

approach

Built edge ML platform on Raspberry Pi with OBD-II integration for vehicle state, MQTT for cloud connectivity, YOLO for visual perception, and TensorFlow MobileNet for efficient inference.

result

Implemented lane departure detection and collision avoidance alerts with sub-100ms latency, designed OTA update mechanism as proof-of-concept for SDV platform evolution.

Raspberry PiOBD-IIMQTTYOLOTensorFlowMobileNet
04

Experience

The path here: robotics clubs and embedded benches, then cloud, HPC, and research — and now GPU drivers at AMD.

Nine roles, three countries, one throughline — moving steadily toward the layers underneath: from robotics benches in Cairo to data-center GPUs in Toronto.

RoboticsEmbeddedCloudHPC · ResearchGPU Drivers
  • Contributing to high-impact software projects supporting current and next-generation AMD GPUs.
  • Debugging and resolving complex Linux kernel and driver issues reported by customers and QA.
  • Designing and implementing new driver features, documenting technical decisions and trade-offs.
  • Collaborating with compute, machine learning, and hardware teams across AMD.
  • Engaging with the open-source community through upstream contributions and reviews.
  • Developed a deterministic version of the OptVerse Cholesky solver in C++, ensuring reproducibility across runs.
  • Optimized sparse linear solvers in C++ for large-scale optimization problems.
  • Applied parallel programming (OpenMP) and HPC techniques to accelerate solver modules.
  • Performed profiling and performance analysis to identify bottlenecks and guide optimizations.
  • Investigated sources of non-determinism (parallel execution, floating-point operations) and contributed efficient solutions.
  • Analyzed solver performance on Hans Mittelmann's benchmarks to provide insights that improved OptVerse competitiveness.
  • Monitored and maintained equipment and physical environment of the research lab.
  • Kept an up-to-date inventory of desks, workstation equipment, development hardware, test tools, and simulation devices.
  • Tracked equipment location and assignments, coordinating access for lab members and ensuring proper usage.
  • Identified high-use and low-use equipment to support purchasing decisions and resource planning.
  • Maintained a safe and organized lab space, addressing safety and usability concerns as they arose.
  • Member of CritLab, conducting research on integrating LLMs and NLP techniques into formal verification and anomaly detection systems.
  • Developed DeepParse, a hybrid LLM-enhanced log parsing framework that integrates large language models with rule-based parsers to improve accuracy and reduce manual configuration.
  • Conducted research on applying LLMs and NLP methods to support formal verification, anomaly detection, and safety-critical system analysis.
  • Investigated novel AI-assisted workflows for model checking, trace analysis, and automated reasoning in high-assurance software.
  • ELEC 471: Delivered tutorials on safety-critical development, requirements engineering, hazard analysis, HARA, FMEA, STPA.
  • Supported students in using model checking tools and formal verification techniques to analyze system behaviour.
  • APSC 142: Supported weekly labs, grading, and mentoring for first-year engineering students learning C programming.
  • Taught C programming, problem-solving patterns, and computational thinking.
  • Served as a technical reviewer for multiple published books, ensuring accuracy, clarity, and relevance of technical content.
  • Created certification-style question banks and practice exams aligned with industry standards and learning objectives.
  • Reviewed and validated hands-on exercises, code samples, and explanations for emerging technologies.
  • Collaborated with authors and editors to improve structure, technical depth, and reader engagement across publications.
relocated · Egypt → Canada · 2024
  • Contributed to the deployment and optimization of high-availability, scalable cloud systems.
  • Assisted in configuring virtualized network functions (VNFs) and managing cloud-native workloads.
  • Gained hands-on experience with telecom-grade systems, cloud orchestration, and Huawei's proprietary platforms.
  • Added support for Saleae and PicoScope analyzers in the global integration testing tool.
  • Built a UI tool for Baby-LIN-II (LIN-bus simulation device) to view, record, and analyze LIN signals.
  • Developed CI automation tools and scripts with WPF, C#, and Python.
  • Improved performance and reliability of GUI tools for automotive testing.
  • Worked with CAN/LIN protocols, conducted validation, and ensured MISRA C compliance.
  • Developed autonomous driving software for Low-Speed Autonomous Vehicles (LSAVs) using LiDARs (mechanical and solid-state), cameras, and IMUs.
  • Contributed to perception, localization, and path planning modules.
  • Built and tested a LiDAR-based obstacle detection and tracking system.
  • Improved navigation accuracy in GPS-denied environments through sensor fusion.
  • Designed and taught project-based curricula using platforms like Arduino and STM32.
  • Guided students through practical labs and final projects to build industry-relevant skills.
  • Helped over 100+ students gain foundational and advanced knowledge in embedded systems.
  • Gained hands-on experience with MCU fundamentals, including CPU architecture, memory management, startup processes, linker scripts, compilation flow, and interrupt handling.
  • Worked with debugging tools to analyze and troubleshoot embedded software.
  • Explored DevOps practices in embedded systems development, contributing to workflow automation and build processes.
  • Conducted a deep dive into bootloaders, RTOS fundamentals, and AUTOSAR OS/layered architecture.
  • Developed understanding of functional safety standards (ISO 26262) and their application in embedded software.
  • Practiced Embedded Linux development using Buildroot and related tools, gaining exposure to kernel, drivers, and board bring-up workflows.
  • Co-founded a student-run robotics club at Mansoura University focused on innovation, hands-on learning, and community building.
  • Organized 4 major hackathons with a combined attendance of over 4,000 participants.
  • Established partnerships with multinational companies including MathWorks and Dassault Systèmes.
  • Led workshops, training sessions, and robotics competitions to promote STEM and practical engineering skills among students.
05

Education & Awards

Where the foundations were poured — and a few moments that told me I was on the right track.

Queen's University logo

MASc. Electrical & Computer Engineering

Queen's University2024 – 2025

Research focus: Formal Methods, Verification & Validation, Large Language Models, System Modeling, Real-Time Systems

4.3/ 4.3
Mansoura University logo

BEng. Mechatronics Engineering

Mansoura University2019 – 2024

Excellence with Honours, Top 10 of class, Academic Excellence Scholarship

3.8/ 4

Awards & Honors

  • HUAWEI ICT Competition (2024)2024

    First Prize Global (Shenzhen, China) and Grand Prize North Africa Regional (Tunisia) in Cloud Track

  • Ideal Student Award2022

    Recognition for academic performance and leadership

Community

  • Co-Founder2022 – 2024
    Mansoura Robotics Club

    Organized 4 hackathons engaging 4,000+ participants in robotics and embedded systems.

  • HUAWEI ICT Academy Ambassador2024 – Present
    HUAWEI

    Promoted ICT education and professional development in emerging markets.

06

Live Signals

Not a static résumé — live proof that I'm still building, right now.

github · contribution graph
07

References

What the people I've built and shipped with have to say.

Hoda Saleh
Khaled Zoheir
Abdelrhman Mosad
Mohamed Khalil
Mahmoud Ebrahim
Mahmoud Labib

Hoda Saleh

Volunteer Colleague, Faculty of Engineering MU

During our volunteering activities at the Faculty of Engineering, I witnessed Amir's unwavering work ethic and dedication to excellence. He's an exceptional team player and proficient communicator who readily shares expertise and collaborates toward team objectives. A true tech enthusiast with contagious enthusiasm, and I wholeheartedly recommend him for any role. 

08

Off the Clock

When I'm not in the kernel, these are the worlds I get lost in — story-first, cinematic, the kind that stick with you.

PLAYER · NightBaRron1412library 8 titlesgenre story-driven AAAplatforms PC · PS · Xboxnow playing Pragmata
09

Contact

If any of this resonates — a hard problem, a role, or just comparing notes — let's talk.

Have a hard systems problem?

Drivers, performance, reproducibility, verification — or just want to compare notes. I read every message and reply within a day or two.

based in Ontario, Canada · open to interesting work
amir.shetaia

Senior Software Engineer | Systems, HPC, and ML. Building reliable, fast, well-measured systems.

© 2026 Amir Shetaia · built with Next.js + Tailwind