SRE & Observability
AegisTickets - AWS EKS Reliability Platform
Engineered a reliability-focused production system using SLIs, SLOs, error budgets, and golden-signal monitoring to guide autoscaling and operational decisions. Built on AWS EKS with Terraform infrastructure as code, implementing comprehensive observability stack with Prometheus and Grafana for real-time monitoring and alerting. The platform achieved 99.95% uptime through proactive incident prevention and automated scaling based on service level objectives.
AWS EKS
Terraform
Prometheus
Grafana
SLIs/SLOs
Error Budgets
Observability
SRE
AI/ML SRE
Enterprise AI/ML Platform
Applied SRE principles to AI workloads, implementing observability, alerting, and cost-performance trade-off controls, reducing inference costs by ~60 percent. Built comprehensive monitoring for AI/ML pipelines with custom metrics, drift detection, and automated scaling based on service level objectives. Integrated FinOps practices to optimize resource utilization and cost efficiency while maintaining model performance and reliability.
AI/ML
SRE
Observability
Alerting
FinOps
Cost Optimization
Cloud-Native Application
Cloud-Native Job Platform
Built a resilient AWS application with health-checked deployments, automated recovery workflows, and zero-downtime delivery. Implemented comprehensive health monitoring, automated failover mechanisms, and rolling update strategies to ensure continuous availability. The platform features proactive health checks, automated remediation, and resilient architecture patterns that maintain service continuity during deployments and infrastructure failures.
AWS
Health Checks
Automated Recovery
Zero Downtime
Resilience
Security & Compliance
AWS Cloud Risk Assessment
Identified reliability and security risks across IAM, storage, networking, and TLS configurations aligned with CIS benchmarks. Conducted comprehensive assessment of AWS environments to identify misconfigurations, overly permissive policies, and security gaps. Provided detailed risk analysis and remediation recommendations to improve cloud security posture and compliance alignment with industry standards.
AWS
IAM
Storage
Networking
TLS
CIS Benchmarks
Risk Assessment
Platform Engineering
FinBankOps: Secure, Multi-Region Kubernetes Infrastructure for Fintech
This project implements a production-grade, secure Kubernetes infrastructure for fintech using Amazon EKS. It supports multi-region deployment, blue/green releases, and GitOps-driven workflows via ArgoCD. Istio handles ingress traffic and internal service mesh routing, while security is reinforced using External Secrets Operator and kube-bench/kubescape audits. Observability is ensured via Prometheus, Grafana, and CloudWatch. The platform enables PCI-DSS-aligned compliance while providing scalable deployment for containerized microservices stored in Amazon ECR.
AWS
EKS
ArgoCD
Istio
Secrets Mgmt
Prometheus
Grafana
KubeBench
ML & DevOps
DevOps-Enabled Real-Time ML Fraud Detection System
This project showcases the complete pipeline for a real-time fraud detection system using a containerized microservices architecture on AWS. Ingestion, inference, and action microservices are deployed to Amazon ECS (Fargate), and their Docker images are stored in ECR. Machine learning inference is based on a trained model that detects anomalous financial transactions in real-time. Infrastructure is managed with Terraform, CI/CD is orchestrated via GitHub Actions, and observability is achieved through Amazon CloudWatch. Fraud alerts are published via Amazon SNS, and the architecture is extensible to support compliance audit logging using Amazon RDS.
AWS
ECS Fargate
GitHub Actions
Amazon RDS
SNS
Terraform
CloudWatch
ML
Application Platform
Secure Three-Tier Web Application on Kubernetes
This project focused on deploying a secure, scalable three-tier web application using AWS and Kubernetes. I provisioned a robust EKS cluster and built Docker containers for both frontend and backend services, hosted securely via AWS ECR. To route traffic efficiently, I configured an ALB Ingress Controller. For observability, I enabled CloudWatch control plane logs to track API server activities, authenticator logs, and audits. The infrastructure was designed to scale dynamically, with IAM roles enforcing principle of least privilege across services.
AWS
Docker
EKS
Terraform
CloudWatch
IAM
CI/CD & Infrastructure
Three-Tier Web App with GitHub Actions CI/CD
In this project, I built a fully automated, environment-aware deployment pipeline for a three-tier web application. The frontend was hosted on S3 while the backend (Node.js) ran on EC2 within a VPC. GitHub Actions orchestrated CI/CD pipelines across dev and prod branches. Infrastructure was provisioned with Terraform, including private/public subnets and NAT gateways. For monitoring, I installed the CloudWatch agent and configured AWS Managed Grafana dashboards with real-time CPU, memory, and disk usage metrics. Alerts were created for SLA-sensitive events. This setup exemplifies production-grade DevOps and cloud architecture.
AWS
EC2
Terraform
GitHub Actions
S3
CloudWatch
Managed Grafana
Full-Stack Application
Cloud-Native Recipe-Sharing Application
To modernize the way I share culinary recipes, I developed and deployed a cloud-native FastAPI application integrated with a React frontend hosted on S3. The backend API was containerized and deployed to EC2, exposed via API Gateway. CloudFormation handled infrastructure provisioning. To ensure performance visibility, I configured Prometheus to scrape FastAPI metrics and visualized real-time traffic using Grafana dashboards. I designed two access layers: a user interface and an admin portal, reflecting real-world content management workflows.
AWS
S3
React
FastAPI
EC2
CloudFormation
Prometheus
Grafana
Full DevOps Pipeline
End-to-End DevOps Pipeline with EKS & ELK Stack
This project implemented a full-stack DevOps solution using GitHub Actions for CI, Terraform for infrastructure automation, and Kubernetes on AWS EKS for orchestration. Dockerized applications were built and deployed with Kubernetes manifests. Logs were centralized using the ELK stack, while Prometheus and Grafana enabled detailed performance monitoring and alerting. Security was reinforced through IAM policies, encrypted storage, and TLS via ACM certificates.
AWS
Terraform
GitHub Actions
Docker
EKS
Prometheus
Grafana
ELK Stack
Disaster Recovery
Automated Cloud Disaster Recovery Solution
This disaster recovery project leveraged AWS infrastructure to build a resilient architecture that could handle regional failover, backup, and restoration. Using Terraform for reproducible infrastructure and GitHub Actions for automation, I integrated Datadog for system observability and alerting to ensure readiness in business continuity scenarios.
AWS
Terraform
GitHub Actions
EC2
S3
Route 53
Datadog
Containerization
Containerized WebApp with CI/CD & Monitoring
This project involved containerizing a Node.js web app, deploying it using a CI/CD pipeline built with GitHub Actions, and configuring Prometheus and Grafana to provide visibility into app health and performance. The goal was to streamline releases and provide real-time monitoring of container behavior and HTTP requests.
Docker
GitHub Actions
Node.js
Prometheus
Grafana
ML Deployment
ML Model Deployment with Flask on AWS
I deployed a Flask-based ML model as a production API on EC2 using Terraform and GitHub Actions. AWS CloudFormation and S3 were used for configuration and storage. Monitoring was integrated with Prometheus and Grafana, and AWS Security Hub was configured for compliance audits and vulnerability detection.
AWS
Flask
ML Model
CloudFormation
S3
EC2
Prometheus
Grafana
Security Hub
Serverless CI/CD
Scalable Web App CI/CD with AWS Amplify
This project centered on building a CI/CD pipeline for a React-based web application. The frontend was deployed using AWS Amplify, and backend logic was handled with AWS Lambda. CodePipeline and CodeBuild automated deployments, and CloudWatch monitored performance metrics and logs.
AWS Amplify
Terraform
AWS Lambda
RDS
CodePipeline
CloudWatch
GCP Platform
Full-Stack Application CI/CD on Google Cloud
I deployed a full-stack application on GCP using Docker containers, Terraform for infra provisioning, and GitHub Actions for CI/CD. Monitoring and alerting were set up using the Google Cloud Operations Suite, providing clear visibility into deployments and runtime behavior.
GCP
Docker
Terraform
GitHub Actions
Cloud Run
Monitoring
Jenkins Pipeline
Node.js CI/CD with Jenkins & S3 Artifacts
This project focused on implementing an efficient Jenkins-based CI/CD pipeline for a Node.js application. Artifacts were managed and stored using S3. GitHub served as the version control system, and automated builds ensured fast feedback loops.
Node.js
GitHub
Jenkins
Amazon S3
Compliance Automation
AWS Infrastructure Compliance Audit System
This compliance audit system utilized AWS Config to evaluate resource conformance across services. Lambda functions were triggered on non-compliant rules, enabling proactive remediation and alerting via SNS.
AWS Config
Lambda
Compliance
IAM
Security Dashboard
AWS Cloud Security Dashboard
I designed a web-based dashboard to visualize and monitor key AWS security metrics, including IAM role usage, open security groups, and policy violations, offering centralized oversight for cloud posture management.
AWS
IAM
S3
Lambda
CloudWatch