DevOps • Site Reliability • Cloud Engineer
Designing resilient cloud-native systems, improving reliability, and scaling infrastructure through automation
Jun 2024 – Jan 2026
Led cloud infrastructure migration and modernization initiatives, improving platform resilience and fault tolerance. Redesigned critical workloads resulting in a 35% increase in system availability and a 40% reduction in unplanned downtime.
Designed and implemented Jenkins-based CI/CD pipelines integrated with Git workflows, reducing release cycles by 50% and minimizing deployment-related production incidents.
Managed containerized workloads using Docker and Kubernetes, optimizing resource utilization and deployment reliability across multiple environments.
Executed disaster recovery validation strategies including failover drills and chaos testing, reducing MTTR by 50% and validating end-to-end recovery workflows.
Jan 2020 – Nov 2023
Designed, deployed, and operated AWS-based cloud infrastructure supporting more than 50 microservices across development, staging, and production environments, emphasizing high availability and horizontal scalability.
Built infrastructure automation workflows using Terraform and Ansible, eliminating approximately 70% of manual provisioning tasks and improving deployment consistency and reliability across environments.
Maintained 99.9% uptime by implementing proactive monitoring, alerting, and observability strategies using CloudWatch, Grafana, and log-driven diagnostic tooling.
Participated in 24×7 on-call rotations, performing production incident triage, mitigation, and root cause analysis. Implemented long-term fixes that improved platform stability and reduced recurrence of critical issues.
Led cost optimization and performance tuning initiatives, identifying inefficient resource utilization patterns and implementing improvements without compromising SLA targets.
Collaborated closely with engineering teams to improve deployment strategies, operational workflows, and reliability best practices across cloud-native services.
IEEE Published Research
Developed a real-time fatigue detection system leveraging computer vision techniques and Eye Aspect Ratio (EAR) tracking to identify drowsiness events with high accuracy and low latency.
Achieved 92% detection accuracy with alert triggering under 200 ms, ensuring responsiveness suitable for safety-critical scenarios.
Implemented modular architecture, robust error handling, and performance optimizations to maintain stability under varying environmental conditions.
Validated model behavior across diverse lighting and head-movement scenarios to minimize false positives and improve real-world reliability.
Designed and implemented an end-to-end telemetry pipeline for continuous monitoring of gas concentration, temperature, and humidity sensors.
Built a secure MQTT → Python → InfluxDB ingestion pipeline capable of handling high-frequency time-series data streams with sub-second telemetry updates.
Optimized retention policies and storage strategies to maintain long-term historical visibility while keeping storage utilization efficient.
Developed Grafana dashboards with anomaly detection logic enabling rapid identification of abnormal sensor behavior and environmental spikes.
Developed a deep learning pipeline using CNN and transfer learning methodologies to classify chest X-ray images for COVID-19 detection.
Achieved 95.5% classification accuracy using VGG16 architecture while prioritizing high recall to minimize false negative diagnoses.
Automated inference workflows supporting batch predictions across 10,000+ medical images, significantly reducing manual review effort.
Designed the model pipeline as a reusable inference module enabling integration into future research or clinical validation systems.