Building Reliable
Infrastructure at Scale

Senior DevOps & Site Reliability Engineer specializing in AWS cloud environments, observability tooling, and infrastructure automation.

Years Experience

AWS

Cloud Expert

24/7

On-Call Ready

✉ Get In Touch What I Do →

$ docker ps --format "table {{.Names}}\t{{.Status}}"

NAMES STATUS

api-gateway Up 3 days

redis-cache Up 3 days

prometheus Up 3 days

$ terraform plan

No changes. Infrastructure is up-to-date.

$ _

Technical Expertise

Skills & Technologies

Comprehensive expertise across the full DevOps and SRE stack, from infrastructure provisioning to observability.

☁

Cloud & Infrastructure

Designing and managing large-scale cloud environments with focus on reliability and cost optimization.

AWS EKS ECS Lambda S3 CloudFront RDS Route 53 VPC Azure GCP

📈

Observability & Monitoring

Building comprehensive monitoring solutions for proactive incident detection and performance optimization.

Splunk Prometheus Grafana Nagios Enhanced Logging Alerting

🛠

Infrastructure as Code

Automating infrastructure provisioning and configuration with modern IaC tools and best practices.

Terraform Ansible CloudFormation Python Bash Scripting

🚀

CI/CD & Containerization

Implementing robust deployment pipelines and container orchestration for seamless delivery.

Jenkins GitLab CI/CD GitHub Actions Kubernetes Docker

🔧

SRE Practices

Applying SRE principles to improve system reliability, reduce toil, and manage incidents effectively.

Incident Management Root Cause Analysis Post-Mortems ServiceNow Jira

🔒

Security & DevSecOps

Integrating security best practices throughout the development lifecycle and infrastructure.

Security Observability DevSecOps Compliance Secure Deployments

Services

What I Do

Helping teams build and maintain reliable, scalable infrastructure.

☁

Cloud Architecture

Design and implement scalable AWS infrastructure with focus on high availability, cost optimization, and security best practices.

📈

Observability Setup

Build comprehensive monitoring solutions with Prometheus, Grafana, and Splunk. Create dashboards that surface actionable insights.

🛠

Infrastructure as Code

Automate infrastructure provisioning with Terraform and Ansible. Version-controlled, repeatable, and auditable deployments.

🚀

CI/CD Pipelines

Design and implement deployment pipelines with Jenkins, GitLab CI/CD, or GitHub Actions. Faster releases with confidence.

🔧

Incident Response

Establish on-call processes, runbooks, and post-incident review workflows. Reduce MTTR and prevent recurring issues.

🔒

Platform Reliability

Improve system uptime through SRE practices, capacity planning, chaos engineering, and proactive performance tuning.

Get to Know Me

About Me

With over 8 years dedicated to designing and optimizing large-scale AWS cloud environments, I've built a proven track record of implementing and enhancing SRE frameworks that improve system reliability, scalability, and performance. My expertise spans the full observability stack, with particular depth in Splunk, and I drive automation through Python and scripting to eliminate toil and improve efficiency.

I thrive in high-pressure environments, participating in 24/7 on-call rotations and leading incident response with thorough root cause analysis. Working across diverse industries has given me a deep understanding of what it takes to keep mission-critical systems running at scale.

When I'm not ensuring systems are running smoothly, you'll find me watching aviation content (there's something fascinating about airport operations and aircraft), exploring new places, or connecting with people from around the world. I believe the best engineers are also lifelong learners, and I'm always eager to pick up new technologies and perspectives.

Location

Toronto, Canada

Focus

Platform Reliability

Specialization

AWS Cloud & Observability

Interests

Aviation, Travel, Technology

Let's Connect

Get In Touch

Looking for a reliable SRE to strengthen your platform? Let's talk.

✉

Email

et@dolbyto.dev

Connect professionally

💻

GitHub

View my projects

Building Reliable Infrastructure at Scale

Skills & Technologies

Cloud & Infrastructure

Observability & Monitoring

Infrastructure as Code

CI/CD & Containerization

SRE Practices

Security & DevSecOps

What I Do

Cloud Architecture

Observability Setup

Infrastructure as Code

CI/CD Pipelines

Incident Response

Platform Reliability

About Me

Get In Touch

Email

LinkedIn

GitHub

Building Reliable
Infrastructure at Scale