About Me

I'm a Senior Site Reliability Engineer specializing in building and operating resilient, large-scale distributed systems. My expertise lies in platform engineering, infrastructure automation, and driving operational excellence through SRE best practices — from defining SLOs to building self-healing infrastructure.

Currently pursuing my M.S. in Computer Science at Georgia Institute of Technology, I combine academic research with hands-on industry experience to push the boundaries of infrastructure reliability and performance at scale.

99.99% Platform Uptime Achieved
$18M+ Cost Savings Delivered
150+ Engineers Trained

Professional Experience

Senior Site Reliability Engineer

Merck & Co. | 2021 - Present

⚙️

Driving reliability and operational excellence across enterprise cloud platforms. Designing, building, and operating large-scale Kubernetes infrastructure while establishing SRE culture, incident management processes, and observability standards.

Key Achievements

  • Platform Reliability: Architected and maintained highly available EKS clusters serving mission-critical workloads, achieving 99.99% uptime through robust SLO-driven practices
  • Observability & Monitoring: Built comprehensive observability stacks with Prometheus, Grafana, and Datadog — enabling proactive alerting, SLI tracking, and faster incident resolution
  • Infrastructure as Code: Designed and implemented Terraform-based infrastructure automation achieving 90% IaC coverage, with automated drift detection and policy enforcement
  • Incident Management: Established blameless postmortem culture and incident response frameworks, reducing MTTR by 40% through runbook automation and on-call best practices
  • Team Leadership: Led 10+ engineers across 20+ strategic projects, delivering $18M+ in cost savings through capacity optimization, right-sizing, and automation
  • Knowledge Sharing: Trained 150+ cloud engineers on SRE best practices, toil reduction strategies, and reliability-first development patterns

Technologies

AWS Kubernetes Terraform Docker Helm ArgoCD Prometheus Grafana Datadog Python GitHub Actions Splunk

Education

M.S. Computer Science - Georgia Institute of Technology (In Progress)

B.S. Information Technology - New Jersey Institute of Technology (2021)

Specialization: Network & Information Security / Web Application Development

Beyond Code

🏃

Running

Training for marathons and half-marathons. The discipline of distance running mirrors SRE — steady pacing, monitoring your metrics, and pushing through when things get tough.

🏓

Pickleball

Competitive player who thrives on quick decision-making and strategic play. The fast reflexes translate well to incident response and real-time problem solving.

🖥

Home Lab

Running a full home infrastructure lab with VLANs, firewalls, Proxmox clusters, and self-hosted monitoring stacks. My personal playground for testing SRE tools and chaos engineering experiments.


Community Leadership

💬

Egyptian IT Community (Discord) - Co-founder

300+ Members | International Collaboration

Established and manage a thriving Discord community for IT and CS professionals and students, fostering networking, knowledge sharing, and collaboration on international projects. Focus on mentoring junior engineers in SRE practices, cloud architecture, and reliability engineering.

Certifications

AWS Associate Solutions Architect

Amazon Web Services

Terraform Associate

HashiCorp

🎓

SDI: Language of Leadership Program

Leadership Development