senior software engineer — amd rocm
Amir Shetaia
Developing GPU drivers for ML and data center workloads at AMD on the ROCm platform. Background in HPC optimization, formal methods, and verification systems.
79.38° W
worked at · studied at
































About
The short version of a longer story — from Mansoura to Toronto, always drawn to the layer just below the surface.
I work where correctness, performance, and hardware meet — and I like it most when the problem is hard enough that the answer has to be measured, not guessed.
Today I'm at AMD, building Linux GPU drivers for the ROCm platform — the layer that lets machine-learning and data-center workloads actually reach the hardware. That's kernel and driver work in C/C++: shipping features for current and next-gen GPUs and chasing down the complex issues customers and QA surface.
Before AMD I made large-scale optimization solvers deterministic and fast at Huawei (C++, OpenMP, HPC), researched formal methods and LLM-assisted verification at Queen's, and spent years in embedded — automotive, robotics, and board bring-up. The constant: low-level systems where correctness and performance both have to hold.

Capabilities
The tools I reach for, grouped by where they live in the stack — from silicon up to models.
Languages
10Firmware and Embedded
05Protocols and Standards
11Cloud and DevOps
05System Design
05Debug and Tools
04AI and ML
07Selected Work
A few projects that show how I think — making a parallel solver reproducible, taming messy logs, and putting a car's senses on the edge.
OptVerse: Deterministic Sparse Linear Solver
R&D Software Engineer · Huawei
Made Huawei's OptVerse Cholesky solver bit-for-bit reproducible across runs in C++/OpenMP without giving up HPC throughput.
Parallel sparse Cholesky factorization produced run-to-run variation — non-associative floating-point reductions and non-deterministic OpenMP scheduling gave slightly different results each run, blocking reproducible benchmarking and root-cause debugging.
Traced the non-determinism to parallel reduction order and floating-point accumulation order; introduced deterministic reduction/scheduling strategies with controlled accumulation, then profiled the hot paths to recover the throughput the determinism constraints cost.
Delivered a deterministic Cholesky solver with reproducible output across runs, validated on Hans Mittelmann's optimization benchmarks to keep OptVerse competitive on large-scale problems.
Hybrid log parsing system combining DeepSeek-R1:8B and Drain algorithm achieving 97.6% accuracy across 16 datasets.
Log parsing is critical for debugging but traditional deterministic methods struggle with variable formats and new log types.
Hybrid approach combining deterministic Drain parsing with LLM-driven template generation using DeepSeek-R1:8B for improved accuracy and adaptability.
Achieved 97.6% parsing accuracy on 16 diverse datasets, enabling more effective anomaly detection and faster root cause analysis.
Raspberry Pi 4 based real-time diagnostics platform combining OBD-II telemetry, MQTT messaging, YOLO object detection, and ML for driver assistance.
Vehicle diagnostics and driver assistance systems require real-time processing with edge compute constraints and reliable connectivity.
Built edge ML platform on Raspberry Pi with OBD-II integration for vehicle state, MQTT for cloud connectivity, YOLO for visual perception, and TensorFlow MobileNet for efficient inference.
Implemented lane departure detection and collision avoidance alerts with sub-100ms latency, designed OTA update mechanism as proof-of-concept for SDV platform evolution.
Experience
The path here: robotics clubs and embedded benches, then cloud, HPC, and research — and now GPU drivers at AMD.
Nine roles, three countries, one throughline — moving steadily toward the layers underneath: from robotics benches in Cairo to data-center GPUs in Toronto.
- ›Contributing to high-impact software projects supporting current and next-generation AMD GPUs.
- ›Debugging and resolving complex Linux kernel and driver issues reported by customers and QA.
- ›Designing and implementing new driver features, documenting technical decisions and trade-offs.
- ›Collaborating with compute, machine learning, and hardware teams across AMD.
- ›Engaging with the open-source community through upstream contributions and reviews.
- ›Developed a deterministic version of the OptVerse Cholesky solver in C++, ensuring reproducibility across runs.
- ›Optimized sparse linear solvers in C++ for large-scale optimization problems.
- ›Applied parallel programming (OpenMP) and HPC techniques to accelerate solver modules.
- ›Performed profiling and performance analysis to identify bottlenecks and guide optimizations.
- ›Investigated sources of non-determinism (parallel execution, floating-point operations) and contributed efficient solutions.
- ›Analyzed solver performance on Hans Mittelmann's benchmarks to provide insights that improved OptVerse competitiveness.
- ›Monitored and maintained equipment and physical environment of the research lab.
- ›Kept an up-to-date inventory of desks, workstation equipment, development hardware, test tools, and simulation devices.
- ›Tracked equipment location and assignments, coordinating access for lab members and ensuring proper usage.
- ›Identified high-use and low-use equipment to support purchasing decisions and resource planning.
- ›Maintained a safe and organized lab space, addressing safety and usability concerns as they arose.
- ›Member of CritLab, conducting research on integrating LLMs and NLP techniques into formal verification and anomaly detection systems.
- ›Developed DeepParse, a hybrid LLM-enhanced log parsing framework that integrates large language models with rule-based parsers to improve accuracy and reduce manual configuration.
- ›Conducted research on applying LLMs and NLP methods to support formal verification, anomaly detection, and safety-critical system analysis.
- ›Investigated novel AI-assisted workflows for model checking, trace analysis, and automated reasoning in high-assurance software.
- ›ELEC 471: Delivered tutorials on safety-critical development, requirements engineering, hazard analysis, HARA, FMEA, STPA.
- ›Supported students in using model checking tools and formal verification techniques to analyze system behaviour.
- ›APSC 142: Supported weekly labs, grading, and mentoring for first-year engineering students learning C programming.
- ›Taught C programming, problem-solving patterns, and computational thinking.
- ›Served as a technical reviewer for multiple published books, ensuring accuracy, clarity, and relevance of technical content.
- ›Created certification-style question banks and practice exams aligned with industry standards and learning objectives.
- ›Reviewed and validated hands-on exercises, code samples, and explanations for emerging technologies.
- ›Collaborated with authors and editors to improve structure, technical depth, and reader engagement across publications.
- ›Contributed to the deployment and optimization of high-availability, scalable cloud systems.
- ›Assisted in configuring virtualized network functions (VNFs) and managing cloud-native workloads.
- ›Gained hands-on experience with telecom-grade systems, cloud orchestration, and Huawei's proprietary platforms.
- ›Added support for Saleae and PicoScope analyzers in the global integration testing tool.
- ›Built a UI tool for Baby-LIN-II (LIN-bus simulation device) to view, record, and analyze LIN signals.
- ›Developed CI automation tools and scripts with WPF, C#, and Python.
- ›Improved performance and reliability of GUI tools for automotive testing.
- ›Worked with CAN/LIN protocols, conducted validation, and ensured MISRA C compliance.
- ›Developed autonomous driving software for Low-Speed Autonomous Vehicles (LSAVs) using LiDARs (mechanical and solid-state), cameras, and IMUs.
- ›Contributed to perception, localization, and path planning modules.
- ›Built and tested a LiDAR-based obstacle detection and tracking system.
- ›Improved navigation accuracy in GPS-denied environments through sensor fusion.
- ›Designed and taught project-based curricula using platforms like Arduino and STM32.
- ›Guided students through practical labs and final projects to build industry-relevant skills.
- ›Helped over 100+ students gain foundational and advanced knowledge in embedded systems.
- ›Gained hands-on experience with MCU fundamentals, including CPU architecture, memory management, startup processes, linker scripts, compilation flow, and interrupt handling.
- ›Worked with debugging tools to analyze and troubleshoot embedded software.
- ›Explored DevOps practices in embedded systems development, contributing to workflow automation and build processes.
- ›Conducted a deep dive into bootloaders, RTOS fundamentals, and AUTOSAR OS/layered architecture.
- ›Developed understanding of functional safety standards (ISO 26262) and their application in embedded software.
- ›Practiced Embedded Linux development using Buildroot and related tools, gaining exposure to kernel, drivers, and board bring-up workflows.
- ›Co-founded a student-run robotics club at Mansoura University focused on innovation, hands-on learning, and community building.
- ›Organized 4 major hackathons with a combined attendance of over 4,000 participants.
- ›Established partnerships with multinational companies including MathWorks and Dassault Systèmes.
- ›Led workshops, training sessions, and robotics competitions to promote STEM and practical engineering skills among students.
Education & Awards
Where the foundations were poured — and a few moments that told me I was on the right track.

MASc. Electrical & Computer Engineering
Research focus: Formal Methods, Verification & Validation, Large Language Models, System Modeling, Real-Time Systems

BEng. Mechatronics Engineering
Excellence with Honours, Top 10 of class, Academic Excellence Scholarship
Awards & Honors
- HUAWEI ICT Competition (2024)2024
First Prize Global (Shenzhen, China) and Grand Prize North Africa Regional (Tunisia) in Cloud Track
- Ideal Student Award2022
Recognition for academic performance and leadership
Community
- Co-Founder2022 – 2024Mansoura Robotics Club
Organized 4 hackathons engaging 4,000+ participants in robotics and embedded systems.
- HUAWEI ICT Academy Ambassador2024 – PresentHUAWEI
Promoted ICT education and professional development in emerging markets.
Live Signals
Not a static résumé — live proof that I'm still building, right now.
References
What the people I've built and shipped with have to say.
Hoda Saleh
Volunteer Colleague, Faculty of Engineering MU
During our volunteering activities at the Faculty of Engineering, I witnessed Amir's unwavering work ethic and dedication to excellence. He's an exceptional team player and proficient communicator who readily shares expertise and collaborates toward team objectives. A true tech enthusiast with contagious enthusiasm, and I wholeheartedly recommend him for any role.
Off the Clock
When I'm not in the kernel, these are the worlds I get lost in — story-first, cinematic, the kind that stick with you.
Contact
If any of this resonates — a hard problem, a role, or just comparing notes — let's talk.
Have a hard systems problem?
Drivers, performance, reproducibility, verification — or just want to compare notes. I read every message and reply within a day or two.







