My Works

Enterprise-Grade Kubernetes CI/CD System

Enterprise-Grade Kubernetes CI/CD System

Overview

Built a production-ready Kubernetes CI/CD pipeline from scratch using industry best practices. This comprehensive project demonstrates the evolution from rapid prototyping to enterprise-grade infrastructure that is scalable, secure, and maintainable for multi-application deployments.

Project Philosophy

Unlike my previous two projects which were rapid prototypes focused on "build fast, break things, learn", this project follows a gradual, iterative approach that simulates professional working environments, focusing on systematic improvements and best practices at each stage.

Key Highlights

  • 3-node bare-metal Kubernetes cluster (1 control plane, 2 workers)
  • GitOps workflow with ArgoCD for declarative continuous delivery
  • Multi-environment architecture (dev, staging, production)
  • Reusable GitHub Actions CI workflow templates
  • Comprehensive security policies and enforcement
  • Cloudflare Tunnel integration for secure public access
  • Complete infrastructure validation pipeline

Architecture Components

Infrastructure Layer

  • Kubernetes Cluster: 3-node bare-metal setup with Flannel CNI
  • Load Balancer: MetalLB for external IP allocation
  • Ingress Controller: Nginx for traffic routing
  • Container Runtime: Containerd with systemd cgroup driver
  • Cloudflared: Cloudflare Tunnel integration for secure public access

CI/CD Layer

  • Source Control: GitHub with branch protection
  • CI Pipeline: GitHub Actions with reusable workflows
  • CD Pipeline: ArgoCD with automated sync policies
  • Container Registry: Docker Hub for image storage

Security Layer

  • Pod Security: Non-root containers, read-only filesystems
  • Network Policies: Restricted pod-to-pod communication
  • RBAC: Role-based access control for service accounts
  • Policy Enforcement: Conftest with custom OPA policies

Deployment Strategy

  • Configuration Management: Kustomize overlays for environment-specific configs
  • Service Mesh: Multi-environment namespace isolation
  • Public Access: Cloudflare Tunnel (cloudflared sidecar)
  • Zero Downtime: Rolling updates with pod disruption budgets

Complete CI/CD Flow

Complete CI/CD Flow Diagram

End-to-end CI/CD flow from code commit to production deployment

CI Workflow Details

GitHub Actions CI Workflow

GitHub Actions CI workflow with validation, testing, and deployment stages

Intelligent CI Pipeline Orchestration

The CI workflow automatically detects changed files and executes appropriate workflows based on the type of changes. This app-specific workflow (dev-ci.yml) orchestrates calls to reusable workflows, ensuring efficient resource usage and fast feedback loops.

Complete Workflow Code (dev-ci.yml)

name: Dev CI (App + Manifests + GitOps Bump)

on:
  push:
    branches: [ "dev", "stage", "prod" ]
    paths:
      - "src/**"
      - "public/**"
      - "package.json"
      - "Dockerfile"
      - "manifests/**"
      - "policy/**"
      
  pull_request:
    branches: [ "stage", "prod" ]
    paths:
      - "manifests/**"
      - "policy/**"

permissions:
  contents: write
  packages: write
  security-events: write

jobs:
  # =============================================
  #  JOB 0: DETECT CHANGED FILES
  # =============================================
  changes:
    runs-on: ubuntu-latest
    outputs:
      app: ${{ steps.filter.outputs.app }}
      manifests: ${{ steps.filter.outputs.manifests }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            app:
              - 'src/**'
              - 'Dockerfile'
              - 'package.json'
            manifests:
              - 'manifests/**'
              - 'policy/**'

  # =============================================
  #  JOB 1: RESOLVE TARGET BRANCH
  # =============================================
  resolve-branch:
    runs-on: ubuntu-latest
    outputs:
      target_branch: ${{ steps.resolve.outputs.target_branch }}
    steps:
      - id: resolve
        run: |
          TARGET_BRANCH="${{ github.event.pull_request.base.ref || github.ref_name }}"
          echo "target_branch=$TARGET_BRANCH" >> $GITHUB_OUTPUT

  # =============================================
  #  JOB 2: APP CI (only dev)
  # =============================================
  app-ci:
    needs: [changes, resolve-branch]
    if: ${{ needs.changes.outputs.app == 'true' && 
            needs.resolve-branch.outputs.target_branch == 'dev' }}
    uses: nishanau/ci-cd-templates/.github/workflows/ci-app.yml@main
    with:
      image_name: nishans0/next-portfolio
      context: .
      dockerfile: ./Dockerfile
      push_image: true
      run_tests: true
    secrets:
      DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }}
      DOCKERHUB_TOKEN: ${{ secrets.DOCKERHUB_TOKEN }}

  # =============================================
  #  JOB 3: MANIFESTS CI
  # =============================================
  manifests-ci:
    needs: [changes, resolve-branch]
    if: ${{ needs.changes.outputs.manifests == 'true' }}
    uses: nishanau/ci-cd-templates/.github/workflows/ci-manifests.yml@main
    with:
      overlay_path: manifests/overlays/${{ needs.resolve-branch.outputs.target_branch }}
      policies_path: policy
      kubeconform_flags: "--strict --ignore-missing-schemas"
    secrets: inherit

  # =============================================
  #  JOB 4: RESOLVE IMAGE TAG
  # =============================================
  resolve-tag:
    runs-on: ubuntu-latest
    needs: [app-ci, manifests-ci, resolve-branch]
    if: ${{ always() && (
      (needs.resolve-branch.outputs.target_branch == 'dev' && 
       needs.app-ci.result == 'success') ||
      (needs.resolve-branch.outputs.target_branch != 'dev' &&
       (needs.app-ci.result == 'success' || needs.app-ci.result == 'skipped') &&
       (needs.manifests-ci.result == 'success' || needs.manifests-ci.result == 'skipped'))
    ) }}
    outputs:
      tag: ${{ steps.resolve_tag.outputs.tag }}
    steps:
      - uses: actions/checkout@v4
      - id: resolve_tag
        run: |
          TARGET_BRANCH="${{ needs.resolve-branch.outputs.target_branch }}"
          if [[ "$TARGET_BRANCH" == "dev" ]]; then
            TAG="sha-${{ github.sha }}"
          elif [[ "$TARGET_BRANCH" == "stage" ]]; then
            TAG=$(yq e '.images[] | select(.name=="docker.io/nishans0/next-portfolio") | .newTag' \
              manifests/overlays/dev/kustomization.yaml)
          elif [[ "$TARGET_BRANCH" == "prod" ]]; then
            TAG=$(yq e '.images[] | select(.name=="docker.io/nishans0/next-portfolio") | .newTag' \
              manifests/overlays/stage/kustomization.yaml)
          fi
          echo "tag=$TAG" >> $GITHUB_OUTPUT

  # =============================================
  #  JOB 5: GITOPS BUMP
  # =============================================
  bump-gitops:
    needs: [app-ci, manifests-ci, resolve-tag, resolve-branch]
    if: ${{ always() && github.event_name == 'push' && (
      (needs.resolve-branch.outputs.target_branch == 'dev' && 
       needs.app-ci.result == 'success') ||
      (needs.resolve-branch.outputs.target_branch != 'dev' &&
       (needs.manifests-ci.result == 'success' || needs.manifests-ci.result == 'skipped'))
    ) }}
    uses: nishanau/ci-cd-templates/.github/workflows/ci-gitops-bump.yml@main
    with:
      gitops_repo: nishanau/NextJSPortfolioSite
      gitops_path: manifests/overlays/${{ needs.resolve-branch.outputs.target_branch }}/kustomization.yaml
      image_name: docker.io/nishans0/next-portfolio
      image_tag: ${{ needs.resolve-tag.outputs.tag }}
    secrets:
      gitops_pat: ${{ secrets.GITOPS_PAT }}

Workflow Jobs Breakdown

Job 0: Change Detection

First step analyzes the commit to determine which workflows need to run:

  • App Changes: Detects modifications to src/**, Dockerfile, package.json
  • Manifest Changes: Detects changes to manifests/**, policy/**
  • Smart Execution: Only runs necessary workflows, skipping irrelevant checks

Job 1: App CI (Build & Publish)

Triggered when app code changes, executes comprehensive build pipeline:

  • Runs linting and unit tests for code quality
  • Builds multi-stage Docker image with optimizations
  • Performs security vulnerability scanning
  • Pushes image to Docker Hub with SHA-based tag (sha-{commit})
  • Environment: Primarily runs on dev branch

Job 2: Manifests CI (Infrastructure Validation)

Triggered when manifest or policy files change:

  • YAML linting for syntax validation
  • Kubeconform schema validation against Kubernetes API
  • Conftest policy enforcement (custom OPA/Rego rules)
  • Kube-score best practices analysis
  • Checkov IaC security scanning with SARIF output
  • Environment: Runs on all branches (dev, stage, prod)

Job 3: Tag Resolution & GitOps Bump

Conditionally updates manifest image tags based on environment and results:

  • Dev: Uses current commit SHA after successful app build
  • Stage: Copies verified tag from dev overlay (promotion flow)
  • Prod: Copies validated tag from stage overlay (production promotion)
  • Commits tag change to trigger ArgoCD sync

Environment-Specific Behavior

Development Environment (dev branch)

  • Trigger: Push to dev branch
  • Build: Always rebuilds app on code changes
  • Tag Format: sha-{commit-hash}
  • Deployment: Immediate after successful CI
  • Purpose: Rapid iteration and testing

Staging Environment (stage branch)

  • Trigger: Merge from dev or direct push
  • Build: No rebuild - promotes dev image
  • Tag Source: Latest tag from dev overlay
  • Deployment: After manifest validation passes
  • Purpose: Pre-production testing with stable builds

Production Environment (prod/master branch)

  • Trigger: Merge from stage
  • Build: No rebuild - promotes stage image
  • Tag Source: Latest tag from stage overlay
  • Deployment: After all validations pass
  • Purpose: Production deployment with battle-tested images

Tag Bump Decision Matrix

The workflow implements intelligent logic to determine when tag updates should occur. This prevents unnecessary deployments and ensures environment integrity:

Development Branch Scenarios

  • App code only: āœ… Builds new image → āœ… Updates tag → āœ… Deploys
  • App + manifests: āœ… Builds + validates → āœ… Updates tag → āœ… Deploys
  • Manifests only: āš ļø No build → āœ… Validates → āŒ No tag update
  • App build failed: āŒ Failed build → āŒ No tag update → āŒ No deploy
  • Manual rollback: Validates existing manifests → ArgoCD syncs to specified tag

Staging Branch Scenarios

  • Promotion from dev: āš ļø No build → āœ… Copies dev tag → āœ… Deploys dev image
  • Manifest changes: āš ļø No build → āœ… Validates → āœ… Uses dev tag → āœ… Deploys
  • Policy validation failed: āŒ Failed validation → āŒ No tag update → āŒ Blocks deploy
  • No changes: Uses last known good image from dev

Production Branch Scenarios

  • Promotion from stage: āš ļø No build → āœ… Copies stage tag → āœ… Deploys stage image
  • Manifest validation: Must pass all checks before tag update
  • Emergency rollback: Manual tag edit → ArgoCD auto-syncs to previous version
  • Direct build (anti-pattern): Technically works but violates GitOps principles

Key Workflow Characteristics

  • Immutable Images: Once built in dev, same image promotes through environments
  • Environment Isolation: Each overlay maintains its own tag independently
  • Fail-Safe Design: Any validation failure blocks deployment
  • Audit Trail: Git history tracks all tag changes and deployments
  • Rollback Support: Manual tag edits enable instant rollbacks via ArgoCD
  • Zero Downtime: Rolling updates ensure continuous availability

GitOps Tag Propagation Flow

# Development → Staging → Production
1. Dev: Build new image
   - Tag: sha-abc123
   - Push to Docker Hub
   - Update: manifests/overlays/dev/kustomization.yaml

2. Stage: Promote verified build
   - Read tag from: manifests/overlays/dev/kustomization.yaml
   - Copy tag: sha-abc123
   - Update: manifests/overlays/stage/kustomization.yaml

3. Prod: Deploy battle-tested image
   - Read tag from: manifests/overlays/stage/kustomization.yaml
   - Copy tag: sha-abc123
   - Update: manifests/overlays/prod/kustomization.yaml
   
✨ Same image (sha-abc123) deployed across all environments
✨ Tested in dev, validated in stage, confident in prod

Implementation Details

Cluster Setup

Built from scratch following Kubernetes best practices:

  • Disabled swap and configured kernel parameters
  • Installed containerd as container runtime
  • Configured systemd cgroup driver for compatibility
  • Initialized control plane with custom pod network CIDR
  • Deployed Flannel CNI for pod networking
  • Joined worker nodes using secure tokens

GitOps with ArgoCD

Declarative continuous delivery implementation:

  • Automated sync policies for hands-off deployments
  • Multi-environment management (dev/stage/prod namespaces)
  • Self-healing capabilities for drift detection
  • Rollback support for failed deployments
  • Health status monitoring and notifications

Multi-Environment Architecture

Kustomize-based configuration management:

manifests/
ā”œā”€ā”€ base/              # Base resources
│   ā”œā”€ā”€ deployment.yaml
│   ā”œā”€ā”€ service.yaml
│   ā”œā”€ā”€ pdb.yaml
│   └── sa.yaml
└── overlays/
    ā”œā”€ā”€ dev/          # Development environment
    ā”œā”€ā”€ stage/        # Staging environment
    └── prod/         # Production environment

Reusable CI Workflows

Modular GitHub Actions templates for consistency:

  • Validation: YAML linting, schema validation
  • Testing: Policy checks with Conftest
  • Building: Docker multi-stage builds
  • Publishing: Tagged images to Docker Hub
  • Deployment: ArgoCD sync triggers

Security Implementation

Comprehensive security measures at every layer:

  • Pod security contexts (non-root, read-only FS)
  • Network policies for traffic control
  • RBAC with least-privilege service accounts
  • Automated policy enforcement with Conftest
  • Image scanning in CI pipeline
  • Secret management best practices

Validation Pipeline

Pre-deployment checks ensure quality:

# YAML Lint
yamllint manifests/

# Schema Validation
kustomize build overlays/dev | kubeconform --strict

# Policy Testing
kustomize build overlays/dev | conftest test -

# Best Practices Check
kustomize build overlays/dev | kube-score score -

Cloudflare Tunnel Integration

Secure public access without port forwarding:

  • Cloudflared deployed as sidecar container
  • Automatic DNS management
  • Zero-trust security model
  • No firewall rule changes needed
  • DDoS protection included

Key Learnings & Evolution

From Rapid Prototyping to Production

  • Before: Quick deployments, minimal validation
  • Now: Comprehensive testing, policy enforcement
  • Result: Confidence in production deployments

Infrastructure as Code

  • All infrastructure declaratively defined
  • Version controlled and reviewable
  • Reproducible across environments
  • Auditable change history

GitOps Principles

  • Git as single source of truth
  • Declarative infrastructure and applications
  • Automated synchronization
  • Observable and auditable

Technical Skills Demonstrated

  • Kubernetes cluster administration
  • Container orchestration and networking
  • GitOps and declarative deployment
  • CI/CD pipeline design and implementation
  • Security policy creation and enforcement
  • Infrastructure validation and testing
  • Multi-environment configuration management
  • Monitoring and observability setup

Live Deployment

Future Enhancements

  • Prometheus & Grafana for monitoring
  • EFK Stack for centralized logging
  • Helm charts for package management
  • Terraform for infrastructure automation
  • Multi-cluster federation
  • Service mesh (Istio/Linkerd) integration