"Your pipeline is green. Your production is broken. Congratulations - you have automated deployments. That's not CI/CD."
The Scene Every DevOps Engineer Recognises
It's 11:47 PM on a Thursday.
The pipeline is green. All checks passed. The Slack notification fires: "Deploy to production: SUCCESS ✅"
Fifteen minutes later, your on‑call phone rings.
Production is broken. A downstream service is returning 500s. The feature flag you deployed fires in an environment it was never tested in. Your "automatic rollback" script hasn't been touched in four months and nobody is sure it still works.
You spend the next three hours debugging manually, coordinating across three teams on a Zoom call, and eventually rolling back by hand at 3 AM.
But in your CI/CD dashboard? Everything was green.
I've reviewed dozens of pipelines across engineering teams at scale. The pattern is almost universal. Most teams have automated deployments. Almost none have true CI/CD.
The difference is not a tool. It's not a YAML file. It's not whether you use Jenkins, CircleCI, GitHub Actions, or anything else. It's a fundamental misunderstanding of what CI/CD is supposed to do.
Part 1: What CI/CD Actually Is (And What It Isn't)
Before we talk about what's broken, we need a shared definition. Because "CI/CD" has been stretched so far by marketing that it has almost lost meaning.
The Textbook Definition (That Everyone Ignores)
Continuous Integration (CI) is the practice of merging code changes frequently - multiple times per day - into a shared mainline, with each merge automatically verified by a build and test suite. The key word is verified. Not just built. Verified against breakage.
Continuous Delivery (CD) is the practice of ensuring software can be released to production at any time. Every commit that passes CI should be deployable - not just buildable.
Continuous Deployment (the third "CD" most teams skip) goes further: every commit that passes all automated checks is automatically deployed to production, no human gate.
Most teams think they have CI/CD. What they actually have:
PIPELINE REALITY CHECKCOPYWhat they think they have: Commit → Build → Test → Deploy (automated) → Production ✅ What they actually have: Commit → Build (partial) → Test (some) → Manual approval → Deploy → Production 🤞
That second flow is automated release management. It is not CI/CD.
| Attribute | Automated Deployments | True CI/CD |
|---|---|---|
| Core Purpose | Move code to servers | Create a feedback loop |
| Test Confidence | Tests exist | Tests verify real behaviour |
| Deployment Frequency | Weekly / monthly | Daily / on‑demand |
| Rollback | Manual | Automatic, tested regularly |
| Staging Fidelity | Approximates production | Mirrors production exactly |
| Feedback Loop | Deployment outcome only | Metrics feed back into pipeline |
| Change Failure Rate | 15–45% | 0–15% (DORA Elite) |
| MTTR | Days | Under 1 hour |
What DORA 2025 Actually Says
The DORA (DevOps Research and Assessment) program has been running since 2014. Their four core metrics - lead time for changes, deployment frequency, change failure rate, and time to restore service - measure how efficiently teams deliver software.
The 2025 report introduced something important: the old Elite/High/Medium/Low classification was replaced with seven new team archetypes that assess delivery performance alongside cultural and human signals. Too many teams were gaming the old metrics without actually improving delivery outcomes.
(Automated Deployments)
(DORA Elite)
Spent Waiting
(Commit → Prod)
To Pipeline Inefficiency
Part 2: The 5 Lies Your Pipeline Is Telling You Right Now
These are not hypothetical. These are patterns I've seen repeatedly - in Jenkins shops, in CircleCI setups, in GitHub Actions workflows, in GitLab pipelines. The tool is different every time. The lie is always the same.
Lie #1: "Our Tests Are Passing"
Here's what a "passing" test suite actually contains in most production codebases:
PYTHON - WHAT 87% COVERAGE ACTUALLY TESTSCOPY# The tests that give you that comforting 87% coverage: def test_user_creation(): user = User(name="test", email="test@test.com") assert user.name == "test" # Tests the constructor. Not the behaviour. def test_payment_amount(): result = calculate_total(100, 0.2) assert result == 120 # Tests math. Not the payment gateway integration. def test_api_response(): mock_response = {"status": "ok"} assert mock_response["status"] == "ok" # Tests a dict literal. Not a real API. def test_database_save(): db = MockDB() db.save({"id": 1}) assert db.count() == 1 # Tests the mock. Not the real database. # These tests pass. They ALWAYS pass. # They would pass even if your entire database layer was broken, # your auth service was returning 403s, and your payment integration # had a bug that only surfaces with real transaction IDs.
(Typical)
Coverage
Coverage
In Pipeline
● Jenkins Add Pact contract tests to your Jenkinsfile build stage:
GROOVY - JENKINSFILE: CONTRACT TESTS IN BUILD PHASECOPY// Jenkinsfile pipeline { agent { docker { image 'node:18-alpine' } } environment { PACT_BROKER_URL = credentials('pact-broker-url') PACT_BROKER_TOKEN = credentials('pact-broker-token') } stages { stage('Install') { steps { sh 'npm ci' } } stage('Test') { parallel { stage('Unit Tests') { steps { sh 'npm run test:unit -- --coverage' } } stage('Contract Tests') { steps { sh 'npm run test:contracts' // Publish pact to broker - fails if contract is broken sh """ npx pact-broker publish ./pacts \\ --broker-base-url ${PACT_BROKER_URL} \\ --broker-token ${PACT_BROKER_TOKEN} \\ --consumer-app-version ${GIT_COMMIT} \\ --tag ${BRANCH_NAME} """ } } stage('Integration Tests') { steps { sh 'npm run test:integration' } } } } stage('Can I Deploy?') { steps { // Hard gate: fails if this service breaks a downstream contract sh """ npx pact-broker can-i-deploy \\ --pacticipant my-service \\ --version ${GIT_COMMIT} \\ --to-environment production \\ --broker-base-url ${PACT_BROKER_URL} \\ --broker-token ${PACT_BROKER_TOKEN} """ } } } }
● CircleCI Same contract gate wired into a CircleCI workflow:
YAML - CIRCLECI: CONTRACT TESTS + CAN-I-DEPLOY GATECOPY# .circleci/config.yml version: 2.1 jobs: test-contracts: docker: - image: cimg/node:18.20 steps: - checkout - restore_cache: keys: ['deps-v1-{{ checksum "package-lock.json" }}'] - run: npm ci - save_cache: key: 'deps-v1-{{ checksum "package-lock.json" }}' paths: [node_modules] - run: name: Run Pact contract tests command: npm run test:contracts - run: name: Publish pacts to broker command: | npx pact-broker publish ./pacts \\ --broker-base-url $PACT_BROKER_URL \\ --broker-token $PACT_BROKER_TOKEN \\ --consumer-app-version $CIRCLE_SHA1 \\ --tag $CIRCLE_BRANCH - run: name: Can-I-Deploy gate (hard fail if contract broken) command: | npx pact-broker can-i-deploy \\ --pacticipant my-service \\ --version $CIRCLE_SHA1 \\ --to-environment production \\ --broker-base-url $PACT_BROKER_URL \\ --broker-token $PACT_BROKER_TOKEN
| Test Type | Focus Area | Typical Teams | Elite Teams |
|---|---|---|---|
| Unit Tests | Logic in isolation | High (80-90%) | 80%+ ✅ |
| Integration Tests | Service-to-service calls | Low (20-30%) | 60%+ |
| Contract Tests | API shape agreements | Near zero | 100% of API boundaries |
| End-to-End Tests | Full user journey | Minimal, often broken | Critical paths only |
| Performance Tests | Latency under load | Rarely in pipeline | Every deploy |
| Chaos / Failure Tests | Behaviour under degradation | Almost never | Weekly |
Lie #2: "We Deploy to Staging First"
STAGING DRIFT TIMELINECOPYDay 1: Staging = Production mirror ✅ Day 30: New DB instance class in prod (manual change, not in IaC) ⚠️ Day 60: New queue added to prod. Staging doesn't have it. ⚠️⚠️ Day 90: Production DB has 2TB. Staging has 1GB. ⚠️⚠️⚠️ Day 120: Hotfix applied to production. Never replicated to staging. ⚠️⚠️⚠️⚠️ Day 150: New env var in prod, missing in staging. ⚠️⚠️⚠️⚠️⚠️ Day 180: Staging is a completely different system wearing production's name. ❌
Staging drift is not a discipline problem. It is an architecture problem. The only solution is ephemeral environments provisioned from code - every pipeline run gets a fresh environment, tested against it, then torn down.
● Jenkins Ephemeral staging via Terraform in a Jenkinsfile:
GROOVY - JENKINSFILE: EPHEMERAL STAGING WITH TERRAFORMCOPY// Jenkinsfile stage('Ephemeral Staging') { steps { // Provision a fresh, IaC-defined environment per build sh """ terraform init -backend-config="key=staging-${BUILD_NUMBER}.tfstate" terraform apply -auto-approve \\ -var="env_id=build-${BUILD_NUMBER}" \\ -var="instance_type=t3.medium" \\ -var="db_class=db.r6g.large" """ // Run full integration + E2E tests against fresh environment sh "npm run test:integration -- --env=build-${BUILD_NUMBER}" sh "npm run test:e2e -- --base-url=https://build-${BUILD_NUMBER}.staging.internal" } post { always { // Tear down REGARDLESS of test result - no drift, no museum sh "terraform destroy -auto-approve -var='env_id=build-${BUILD_NUMBER}' || true" } } }
● CircleCI Same pattern using CircleCI's Docker service containers for a lightweight ephemeral approach:
YAML - CIRCLECI: REAL SERVICE CONTAINERS (NO MOCKS)COPY# .circleci/config.yml jobs: integration-tests: docker: - image: cimg/node:18.20 - image: cimg/postgres:15.6 # Real DB, not a mock environment: POSTGRES_DB: test_db POSTGRES_PASSWORD: testpass - image: cimg/redis:7.2 # Real Redis, not a mock - image: localstack/localstack # AWS services emulated locally environment: SERVICES: s3,sqs,sns environment: DATABASE_URL: "postgresql://postgres:testpass@localhost:5432/test_db" REDIS_URL: "redis://localhost:6379" AWS_ENDPOINT: "http://localhost:4566" steps: - checkout - run: npm ci - run: name: Wait for services to be ready command: | dockerize -wait tcp://localhost:5432 -timeout 60s dockerize -wait tcp://localhost:6379 -timeout 30s dockerize -wait tcp://localhost:4566 -timeout 30s - run: name: Run integration tests against real services command: npm run test:integration # CircleCI tears down all service containers after job - zero drift
Lie #3: "We Have Automatic Rollbacks"
BASH - THE "AUTOMATIC ROLLBACK" IN MOST TEAMSCOPY#!/bin/bash # rollback.sh - last modified 8 months ago # NOTE: this assumes the previous artifact is still in S3 # TODO: add error handling (from 2 years ago, never done) kubectl rollout undo deployment/my-service echo "Rollback initiated (probably)"
"Rollback initiated (probably)" is not a rollback system. A real automatic rollback is: triggered by metrics, not humans. Tested regularly - rollback drills every sprint. Fast - under 5 minutes from alarm to stable. Verified - automated checks confirm health after rollback.
● Jenkins Health-check validation hook with real rollback logic:
GROOVY - JENKINSFILE: METRIC-GATED DEPLOY WITH AUTO-ROLLBACKCOPY// Jenkinsfile - production deploy with health validation + rollback stage('Production Deploy') { steps { script { def deploySuccess = false try { // Deploy new version (canary - 10% traffic first) sh "./scripts/canary-deploy.sh --image ${IMAGE_TAG} --weight 10" // Wait and check real metrics sh "./scripts/health-check.sh" // exits non-zero if unhealthy echo "✅ Canary healthy. Promoting to 100%." sh "./scripts/canary-deploy.sh --image ${IMAGE_TAG} --weight 100" deploySuccess = true } catch (err) { echo "❌ Health check failed: ${err.message}" echo " Initiating automatic rollback..." sh "./scripts/rollback.sh --to-previous" error("Deployment rolled back due to health check failure.") } } } } // scripts/health-check.sh (simplified) // #!/bin/bash // set -e // MAX_RETRIES=10; SLEEP=5 // for i in $(seq 1 $MAX_RETRIES); do // HTTP=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health) // [ "$HTTP" == "200" ] && break // [ $i -eq $MAX_RETRIES ] && exit 1 // sleep $SLEEP // done // ERROR_RATE=$(prometheus-query 'rate(http_requests_total{status=~"5.."}[2m])') // [ "$(echo "$ERROR_RATE > 1.0" | bc -l)" -eq 1 ] && exit 1 // P99=$(prometheus-query 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[2m]))') // [ "$(echo "$P99 > 2.0" | bc -l)" -eq 1 ] && exit 1
● CircleCI Health validation job with automatic workflow cancellation on failure:
YAML - CIRCLECI: POST-DEPLOY HEALTH VALIDATION + ROLLBACKCOPYjobs: validate-and-promote: docker: - image: cimg/base:current steps: - checkout - run: name: Deploy canary (10% traffic) command: ./scripts/canary-deploy.sh --weight 10 --image $CIRCLE_SHA1 - run: name: Validate canary health (error rate + p99) command: | for i in {1..10}; do HTTP=$(curl -s -o /dev/null -w "%{http_code}" https://app.example.com/health) [ "$HTTP" == "200" ] && break [ $i -eq 10 ] && { echo "❌ Health check failed"; exit 1; } sleep 10 done ERROR_RATE=$(./scripts/get-metric.sh error_rate_pct) P99=$(./scripts/get-metric.sh p99_latency_seconds) [ "$(echo "$ERROR_RATE > 1.0" | bc -l)" -eq 1 ] && exit 1 [ "$(echo "$P99 > 2.0" | bc -l)" -eq 1 ] && exit 1 echo "✅ Canary healthy" - run: name: Promote to 100% command: ./scripts/canary-deploy.sh --weight 100 --image $CIRCLE_SHA1 - run: name: Auto-rollback on failure when: on_fail command: | echo "❌ Validation failed. Rolling back..." ./scripts/rollback.sh --to-previous ./scripts/notify-slack.sh "🚨 Auto-rollback triggered on $CIRCLE_SHA1"
validate-and-promote job fails in CircleCI or the health-check.sh exits non-zero in Jenkins, the pipeline catches it and invokes rollback immediately. No phone call at 3 AM required.Lie #4: "Our Pipeline Is Fast"
Ask your team: how long does your commit‑to‑production take? Most say "about 20 minutes." When you actually measure it, it's 47 minutes. And that's if nothing goes wrong.
WHERE THE TIME ACTUALLY GOESCOPYDeveloper pushes commit ↓ [3 min] - Webhook fires, pipeline triggers ↓ [5 min] - Jenkins agent spins up (no pre-warmed agents) ↓ [10 min] - npm install (no caching) ↓ [8 min] - Unit tests run SEQUENTIALLY ↓ [4 min] - Docker build (no layer cache) ↓ [2 min] - Manual approval notification sent ↓ [240 min] - WAITING for someone to click "Approve" ↓ [10 min] - Integration tests (sequential) ↓ [1440 min] - WAITING for next deploy window ↓ [8 min] - Deploy to production Total: ~1,730 minutes (~28 hours) Actual compute time: ~50 minutes Time waiting: ~1,680 minutes (97% of total lead time)
(Typical Enterprise)
Time
Waiting
Target
● Jenkins The fix: parallelise your stages and add proper caching:
GROOVY - JENKINSFILE: PARALLELISED BUILD WITH CACHINGCOPY// Jenkinsfile - parallel stages + Docker layer cache pipeline { agent { docker { image 'node:18-alpine' } } options { timestamps() } stages { stage('Install') { steps { // Use Jenkins workspace caching for node_modules cache(maxCacheSize: 500, caches: [ arbitraryFileCache(path: 'node_modules', cacheValidityDecidingFile: 'package-lock.json') ]) { sh 'npm ci --prefer-offline' } } } // All suites run IN PARALLEL - not sequentially stage('Verify') { parallel { stage('Unit Tests') { steps { sh 'npm run test:unit -- --coverage' } post { always { junit 'test-results/unit/*.xml' } } } stage('Integration Tests') { steps { sh 'npm run test:integration' } } stage('Contract Tests') { steps { sh 'npm run test:contracts' } } stage('Docker Build') { steps { sh """ docker build \\ --cache-from my-registry/my-service:latest \\ --build-arg BUILDKIT_INLINE_CACHE=1 \\ -t my-registry/my-service:${GIT_COMMIT} \\ -t my-registry/my-service:latest . """ } } } } stage('Push') { steps { sh "docker push my-registry/my-service:${GIT_COMMIT}" sh "docker push my-registry/my-service:latest" } } } }
● CircleCI Fan-out parallel jobs with dependency caching and Docker layer cache:
YAML - CIRCLECI: FAN-OUT PARALLEL JOBS WITH CACHINGCOPYversion: 2.1 orbs: docker: circleci/docker@2.6 jobs: test-unit: docker: [{ image: cimg/node:18.20 }] steps: - checkout - restore_cache: { keys: ['deps-{{ checksum "package-lock.json" }}'] } - run: npm ci - save_cache: { key: 'deps-{{ checksum "package-lock.json" }}', paths: [node_modules] } - run: npm run test:unit -- --coverage - store_test_results: { path: test-results } test-integration: docker: - image: cimg/node:18.20 - image: cimg/postgres:15.6 - image: cimg/redis:7.2 steps: - checkout - restore_cache: { keys: ['deps-{{ checksum "package-lock.json" }}'] } - run: npm ci - run: npm run test:integration test-contracts: docker: [{ image: cimg/node:18.20 }] steps: - checkout - restore_cache: { keys: ['deps-{{ checksum "package-lock.json" }}'] } - run: npm ci - run: npm run test:contracts build-image: machine: { image: ubuntu-2204:current } steps: - checkout # CircleCI Docker layer caching (DLC) - huge speedup - docker/build: image: my-registry/my-service tag: $CIRCLE_SHA1 cache_from: my-registry/my-service:latest extra_build_args: --build-arg BUILDKIT_INLINE_CACHE=1 - run: docker push my-registry/my-service:$CIRCLE_SHA1 # ALL four jobs run simultaneously - fan-out pattern workflows: build-and-test: jobs: - test-unit - test-integration - test-contracts - build-image
| DORA Category | Lead Time | Deploy Frequency | Change Failure Rate | MTTR |
|---|---|---|---|---|
| Elite | <1 hour | On‑demand (multiple/day) | 0–15% | Under 1 hour |
| High | 1 day to 1 week | 1/day to 1/week | 16–30% | Less than 1 day |
| Medium | 1 week to 1 month | 1/week to 1/month | 16–30% | 1 day to 1 week |
| Low | 1 to 6 months | Less than 1/month | 16–45% | More than 6 months |
Lie #5: "We Have Approval Gates"
Manual approval steps are the most insidious lie in CI/CD. They feel like safety. They look like process. In reality, they are the opposite of CI/CD. A manual approval step is an admission that you don't trust your automated tests.
Wait Time
(12 services × 8 deploys)
Wasted Per Week
● Jenkins Replace manual input with an automated quality gate stage:
GROOVY - JENKINSFILE: AUTOMATED QUALITY GATE (NO MANUAL input{})COPY// ❌ WHAT MOST TEAMS HAVE: stage('Approve') { steps { input message: 'Deploy to production?', ok: 'Yes, deploy' // Average 2.3 hours waiting for someone to click this } } // ✅ WHAT YOU SHOULD HAVE INSTEAD: stage('Quality Gate') { steps { script { // Gate 1: Test coverage threshold def coverage = sh( script: "cat coverage/coverage-summary.json | jq '.total.lines.pct'", returnStdout: true ).trim().toFloat() if (coverage < 80) { error("❌ Coverage ${coverage}% is below 80% threshold") } echo "✅ Coverage: ${coverage}%" // Gate 2: No high/critical vulnerabilities def vulnCount = sh( script: "trivy image --severity HIGH,CRITICAL --format json my-registry/my-service:${GIT_COMMIT} | jq '[.Results[].Vulnerabilities[]?] | length'", returnStdout: true ).trim().toInteger() if (vulnCount > 0) { error("❌ ${vulnCount} HIGH/CRITICAL vulnerabilities found") } echo "✅ Security scan: clean" // Gate 3: Performance baseline comparison def p99 = sh( script: "./scripts/get-staging-p99.sh", returnStdout: true ).trim().toFloat() if (p99 > 2.0) { error("❌ P99 latency ${p99}s exceeds 2s baseline") } echo "✅ P99: ${p99}s - within baseline" } } }
● CircleCI Same gates as a dedicated quality-gate job in the workflow:
YAML - CIRCLECI: AUTOMATED QUALITY GATE JOBCOPYjobs: quality-gate: docker: [{ image: cimg/node:18.20 }] steps: - checkout - attach_workspace: { at: /tmp/artifacts } - run: name: Gate 1 - Coverage threshold (min 80%) command: | COVERAGE=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') echo "Coverage: $COVERAGE%" [ "$(echo "$COVERAGE < 80" | bc -l)" -eq 1 ] && { echo "❌ Coverage below 80%"; exit 1; } echo "✅ Coverage gate passed" - run: name: Gate 2 - Security scan (no HIGH/CRITICAL) command: | docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \\ aquasec/trivy:latest image \\ --exit-code 1 --severity HIGH,CRITICAL \\ my-registry/my-service:$CIRCLE_SHA1 echo "✅ Security gate passed" - run: name: Gate 3 - Performance baseline command: | P99=$(./scripts/get-staging-p99.sh) [ "$(echo "$P99 > 2.0" | bc -l)" -eq 1 ] && { echo "❌ P99 ${P99}s exceeds 2s"; exit 1; } echo "✅ Performance gate passed - P99: ${P99}s" workflows: build-test-deploy: jobs: - test-unit - test-integration - test-contracts - quality-gate: requires: [test-unit, test-integration, test-contracts] - deploy-production: requires: [quality-gate] # Only deploy if ALL gates pass filters: { branches: { only: main } }
Part 3: The Root Cause - The Tool Trap
All five lies share a common root. It's not laziness. It's not lack of budget. It's a conceptual error the industry has been making for 20 years.
Alternative: GitHub Actions (for GitHub‑hosted teams)
● GitHub Actions The same architecture implemented as a GitHub Actions workflow - shown here as an alternative for teams on GitHub rather than self‑hosted Jenkins:
YAML - GITHUB ACTIONS: SAME 5‑STAGE ARCHITECTURE (ALTERNATIVE)COPY# .github/workflows/ci-cd.yml name: CI/CD Pipeline on: push: { branches: [main] } pull_request: { branches: [main] } permissions: id-token: write # OIDC - no stored cloud credentials contents: read jobs: # [2] Fan‑out test matrix - equivalent to Jenkins parallel{} or CircleCI fan‑out test: runs-on: ubuntu-latest strategy: fail-fast: false matrix: suite: [unit, integration, contracts] steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: '18', cache: 'npm' } - run: npm ci - run: npm run test:${{ matrix.suite }} build-image: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: | docker build --cache-from my-registry/my-service:latest -t my-registry/my-service:${{ github.sha }} . docker push my-registry/my-service:${{ github.sha }} # [3] Quality gate - equivalent to Jenkins quality gate stage quality-gate: needs: [test, build-image] runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npm ci - run: | COV=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') [ "$(echo "$COV < 80" | bc -l)" -eq 1 ] && exit 1 - run: | docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL my-registry/my-service:${{ github.sha }} # [5] Production deploy deploy: needs: [quality-gate] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: | ./scripts/canary-deploy.sh --image ${{ github.sha }} --weight 10 ./scripts/health-check.sh ./scripts/canary-deploy.sh --image ${{ github.sha }} --weight 100 - if: failure() run: ./scripts/rollback.sh --to-previous
Part 4: The Complete CI/CD Tools Landscape - 2026
This is the honest, unbiased map. I'll call out where each tool genuinely wins rather than marketing at you.
4.1 CI Tools - Build & Test
| CI Platform | Config | Hosting | Parallelism | Caching | Maintenance | Cost Model | Best For |
|---|---|---|---|---|---|---|---|
| Jenkins ★ | Groovy | Self-hosted only | parallel{} block ★★ | Manual setup | High (JVM, plugins) | Infra cost + engineer time | Custom workflows, air‑gapped |
| CircleCI ★ | YAML | SaaS + self‑hosted | Fan‑out jobs ★★ | Docker layer cache ★★ | Zero (SaaS) | Per‑minute (credits) | Fast iteration, Docker‑first |
| GitHub Actions | YAML | SaaS + self‑hosted | Matrix strategy ★ | actions/cache | Zero (SaaS) | Per‑minute ($0.008/min) | GitHub‑native teams |
| GitLab CI | YAML | SaaS + self‑hosted | parallel: keyword | Cache config | Medium (self‑managed) | Per‑user + minutes | DevSecOps‑focused teams |
| Buildkite | YAML | Hybrid | Parallel steps | Agent caching | Medium (agents) | Per‑user + agents | Large eng orgs, hybrid |
| AWS CodeBuild | YAML (buildspec) | AWS managed | Batch builds | S3 cache | Zero (managed) | Per‑second ($0.005/min) | AWS‑native shops |
| Tekton | YAML (CRDs) | Self‑hosted (K8s) | Pipeline runs | Workspace volumes | High (K8s expertise) | Infra only | K8s platform teams |
4.2 CD / Deployment Tools
| CD Tool | Model | Key Strengths | Limitations | Best For |
|---|---|---|---|---|
| Jenkins Deploy Jobs | Push‑based CD | Already in your stack, full scripting power | Not declarative, hard to audit | Teams already on Jenkins |
| CircleCI Deploy Jobs | Push‑based CD | Fan‑out deploy, environment orbs | No GitOps, SaaS dependency | Teams already on CircleCI |
| ArgoCD | GitOps (K8s) | Declarative, excellent UI, sync status ★ | K8s only, complex RBAC | EKS / K8s teams |
| Flux CD | GitOps (K8s) | CNCF graduated, lightweight | No UI (by design), K8s only | Minimalist K8s teams |
| Spinnaker | Multi‑cloud CD | Advanced canary, Netflix‑proven | Massive complexity | Large multi‑cloud orgs |
| AWS CodeDeploy | Push‑based (AWS) | Native rollback, canary, blue/green | AWS‑only | AWS EC2/ECS/Lambda |
| Octopus Deploy | Release mgmt | Strong .NET, runbooks | Niche, license cost | .NET / Windows shops |
4.3 IaC for Pipeline Infrastructure
| IaC Platform | Language | Multi-Cloud | Key Strengths | Best For |
|---|---|---|---|---|
| Terraform / OpenTofu | HCL | Yes ★★ | Largest provider ecosystem, state mgmt, drift detection | Multi‑cloud / any team |
| Ansible | YAML + Python | Yes ★ | Agentless, great for config mgmt + deploy scripts | VM‑heavy, hybrid cloud |
| AWS CDK | TypeScript / Python | AWS only | Type‑safe, L2 constructs, IDE autocomplete | AWS‑native teams |
| Pulumi | TS / Python / Go | Yes ★ | Real programming languages, multi‑cloud | Teams preferring code over DSL |
| Crossplane | YAML (CRDs) | Yes ★ | K8s‑native IaC, self‑healing infra | K8s platform teams |
Part 5: Jenkins Deep Dive - The Full Real Pipeline
Jenkins still powers an estimated 44% of CI/CD pipelines worldwide. Let's build the real 5‑stage pipeline in Jenkins - not the 3‑stage build‑test‑deploy you probably have now.
GROOVY - JENKINSFILE: COMPLETE 5‑STAGE REAL CI/CD PIPELINECOPY// Jenkinsfile - Real 5‑Stage CI/CD Pipeline // Matches the architecture: Source → Build+Test → Quality Gate → Staging → Production pipeline { agent { docker { image 'node:18-alpine' args '-v /var/run/docker.sock:/var/run/docker.sock' } } environment { IMAGE_NAME = 'my-registry/my-service' PACT_BROKER_URL = credentials('pact-broker-url') PACT_BROKER_TOKEN = credentials('pact-broker-token') SLACK_WEBHOOK = credentials('slack-webhook') REGISTRY_CREDS = credentials('registry-creds') } options { timeout(time: 30, unit: 'MINUTES') // Kill stuck pipelines timestamps() disableConcurrentBuilds() // No double‑deploys buildDiscarder(logRotator(numToKeepStr: '20')) } // ───────────────────────────────────────────── // [1] SOURCE - Jenkins SCM checkout (automatic) // ───────────────────────────────────────────── stages { // ───────────────────────────────────────────── // [2] BUILD + TEST - all parallel // ───────────────────────────────────────────── stage('Build + Test') { parallel { stage('Unit Tests') { steps { cache(maxCacheSize: 500, caches: [ arbitraryFileCache( path: 'node_modules', cacheValidityDecidingFile: 'package-lock.json' ) ]) { sh 'npm ci --prefer-offline' } sh 'npm run test:unit -- --coverage --ci' } post { always { junit 'test-results/unit/*.xml' publishHTML([ reportDir: 'coverage/lcov-report', reportFiles: 'index.html', reportName: 'Coverage Report' ]) } } } stage('Integration Tests') { agent { docker { image 'node:18-alpine' // Sidecar services for integration tests args '--link postgres:postgres --link redis:redis' } } steps { sh 'npm ci' sh 'npm run test:integration' } } stage('Contract Tests') { steps { sh 'npm ci' sh 'npm run test:contracts' sh """ npx pact-broker publish ./pacts \\ --broker-base-url ${PACT_BROKER_URL} \\ --broker-token ${PACT_BROKER_TOKEN} \\ --consumer-app-version ${GIT_COMMIT} \\ --tag ${BRANCH_NAME} """ } } stage('Docker Build') { steps { sh """ echo ${REGISTRY_CREDS_PSW} | \\ docker login -u ${REGISTRY_CREDS_USR} --password-stdin my-registry docker build \\ --cache-from ${IMAGE_NAME}:latest \\ --build-arg BUILDKIT_INLINE_CACHE=1 \\ -t ${IMAGE_NAME}:${GIT_COMMIT} \\ -t ${IMAGE_NAME}:latest . """ } } } } // ───────────────────────────────────────────── // [3] QUALITY GATE - automated, no manual input // ───────────────────────────────────────────── stage('Quality Gate') { steps { script { // Gate 1: Coverage def coverage = sh( script: "cat coverage/coverage-summary.json | jq '.total.lines.pct'", returnStdout: true ).trim().toFloat() if (coverage < 80) { error("Coverage ${coverage}% < 80%") } echo "✅ Coverage: ${coverage}%" // Gate 2: Security - no HIGH/CRITICAL vulns def vulns = sh( script: """ trivy image --severity HIGH,CRITICAL --format json \\ ${IMAGE_NAME}:${GIT_COMMIT} | \\ jq '[.Results[].Vulnerabilities[]?] | length' """, returnStdout: true ).trim().toInteger() if (vulns > 0) { error("${vulns} HIGH/CRITICAL vulnerabilities found") } echo "✅ Security: clean" // Gate 3: Can‑I‑Deploy pact verification sh """ npx pact-broker can-i-deploy \\ --pacticipant my-service \\ --version ${GIT_COMMIT} \\ --to-environment production \\ --broker-base-url ${PACT_BROKER_URL} \\ --broker-token ${PACT_BROKER_TOKEN} """ echo "✅ Contract verification: safe to deploy" } } } // ───────────────────────────────────────────── // [4] EPHEMERAL STAGING - IaC‑provisioned // ───────────────────────────────────────────── stage('Ephemeral Staging') { when { branch 'main' } steps { sh """ terraform init -backend-config="key=staging-${BUILD_NUMBER}.tfstate" terraform apply -auto-approve \\ -var="env_id=build-${BUILD_NUMBER}" \\ -var="app_image=${IMAGE_NAME}:${GIT_COMMIT}" """ sh "npm run test:e2e -- --base-url=https://build-${BUILD_NUMBER}.staging.internal" } post { always { // Torn down REGARDLESS of test outcome sh "terraform destroy -auto-approve -var='env_id=build-${BUILD_NUMBER}' || true" } } } // ───────────────────────────────────────────── // [5] PRODUCTION DEPLOY - canary with auto‑rollback // ───────────────────────────────────────────── stage('Production Deploy') { when { branch 'main' } steps { script { try { // Push image first sh "docker push ${IMAGE_NAME}:${GIT_COMMIT}" sh "docker push ${IMAGE_NAME}:latest" // Canary: 10% traffic sh "./scripts/canary-deploy.sh --image ${GIT_COMMIT} --weight 10" sh "./scripts/health-check.sh --retries 12 --error-threshold 1 --p99-threshold 2.0" echo "✅ Canary healthy. Promoting to 100%." // Full rollout sh "./scripts/canary-deploy.sh --image ${GIT_COMMIT} --weight 100" // Emit DORA deployment metric sh "./scripts/emit-dora-metric.sh deployment_success ${GIT_COMMIT}" } catch (err) { echo "❌ Deploy failed: ${err.message}" sh "./scripts/rollback.sh --to-previous" sh "./scripts/emit-dora-metric.sh deployment_failure ${GIT_COMMIT}" error("Production deployment rolled back.") } } } } } post { success { sh """ curl -s -X POST ${SLACK_WEBHOOK} \\ -H 'Content-type: application/json' \\ -d '{"text":"✅ Deployed: ${JOB_NAME} @ ${GIT_COMMIT[0..6]}"}' """ } failure { sh """ curl -s -X POST ${SLACK_WEBHOOK} \\ -H 'Content-type: application/json' \\ -d '{"text":"❌ Pipeline failed: ${JOB_NAME} #${BUILD_NUMBER} - check ${BUILD_URL}"}' """ } } }
- Maximum pipeline customisation - Groovy scripting can do anything
- Self‑hosted: works in air‑gapped environments, full data control
- Complex multi‑branch pipelines with shared library abstractions
- Orchestrating non‑code workflows (hardware test rigs, custom tooling)
- Huge plugin ecosystem for legacy integrations
- 10+ years of investment already made - migration cost is real
- High maintenance: JVM tuning, plugin updates, Groovy debugging
- No DX for developers - separate UI from their code repository
- Groovy DSL has a steep learning curve vs YAML tools
- Self‑hosted means you own security patching and availability
- Cold start on agents is slow without pre‑warmed agent pools
- No built‑in secret management - relies on Credentials plugin
Jenkins Shared Libraries - The Right Way to Avoid Duplication
If you have 20 services all with similar Jenkinsfiles, you're probably copy‑pasting. Shared Libraries let you centralise pipeline logic.
GROOVY - JENKINS SHARED LIBRARY: vars/standardPipeline.groovyCOPY// vars/standardPipeline.groovy - shared library // Called from any service Jenkinsfile with: standardPipeline(config) def call(Map config = [:]) { def imageName = config.get('image', 'my-registry/unknown') def coverageMin = config.get('coverageMin', 80) def e2eEnabled = config.get('e2e', true) pipeline { agent { docker { image 'node:18-alpine' } } options { timeout(time: 30, unit: 'MINUTES'); timestamps() } stages { stage('Build + Test') { parallel { stage('Unit') { steps { sh 'npm ci && npm run test:unit -- --coverage' } } stage('Contracts') { steps { sh 'npm run test:contracts' } } stage('Docker') { steps { sh "docker build -t ${imageName}:${GIT_COMMIT} ." } } } } stage('Quality Gate') { steps { script { qualityGate(imageName, coverageMin) } } } stage('Staging') { when { expression { e2eEnabled && env.BRANCH_NAME == 'main' } } steps { script { ephemeralStaging(BUILD_NUMBER) } } } stage('Deploy') { when { branch 'main' } steps { script { canarydeploy(imageName, GIT_COMMIT) } } } } } } // Any service Jenkinsfile becomes just: // @Library('pipeline-library') _ // standardPipeline(image: 'my-registry/payment-service', coverageMin: 85)
Part 6: CircleCI Deep Dive - The Full Real Pipeline
CircleCI's model is fundamentally different from Jenkins: jobs run in parallel by default, caching is first‑class, and the configuration is pure YAML. Here's the same 5‑stage architecture implemented as a production CircleCI config.
parallel{} block. The default mental model is fan‑out, not sequential. Embrace it.YAML - CIRCLECI: COMPLETE 5‑STAGE REAL CI/CD PIPELINECOPY# .circleci/config.yml - Real 5‑Stage CI/CD Pipeline # Matches: Source → Build+Test (fan‑out) → Quality Gate → Staging → Production version: 2.1 orbs: docker: circleci/docker@2.6 terraform: circleci/terraform@3.2 slack: circleci/slack@4.13 # ───────────────────────────────────────────────────── # REUSABLE COMMANDS # ───────────────────────────────────────────────────── commands: install-deps: steps: - restore_cache: { keys: ['deps-v2-{{ checksum "package-lock.json" }}'] } - run: npm ci --prefer-offline - save_cache: key: 'deps-v2-{{ checksum "package-lock.json" }}' paths: [node_modules] setup-registry: steps: - run: name: Log in to container registry command: | echo $REGISTRY_PASSWORD | \\ docker login -u $REGISTRY_USERNAME --password-stdin my-registry # ───────────────────────────────────────────────────── # [2] BUILD + TEST JOBS - all run simultaneously # ───────────────────────────────────────────────────── jobs: test-unit: docker: [{ image: cimg/node:18.20 }] steps: - checkout - install-deps - run: npm run test:unit -- --coverage --ci - store_test_results: { path: test-results } - persist_to_workspace: root: . paths: [coverage] test-integration: docker: - image: cimg/node:18.20 - image: cimg/postgres:15.6 environment: { POSTGRES_DB: test_db, POSTGRES_PASSWORD: testpass } - image: cimg/redis:7.2 - image: localstack/localstack environment: { SERVICES: s3,sqs,sns } environment: DATABASE_URL: "postgresql://postgres:testpass@localhost:5432/test_db" steps: - checkout - install-deps - run: name: Wait for services command: | dockerize -wait tcp://localhost:5432 -timeout 60s dockerize -wait tcp://localhost:6379 -timeout 30s - run: npm run test:integration test-contracts: docker: [{ image: cimg/node:18.20 }] steps: - checkout - install-deps - run: npm run test:contracts - run: name: Publish pacts to broker command: | npx pact-broker publish ./pacts \\ --broker-base-url $PACT_BROKER_URL \\ --broker-token $PACT_BROKER_TOKEN \\ --consumer-app-version $CIRCLE_SHA1 \\ --tag $CIRCLE_BRANCH build-image: machine: { image: ubuntu-2204:current } steps: - checkout - setup-registry - docker/build: image: my-registry/my-service tag: $CIRCLE_SHA1 # CircleCI Docker Layer Caching - huge speedup on large images cache_from: my-registry/my-service:latest extra_build_args: --build-arg BUILDKIT_INLINE_CACHE=1 - run: name: Tag and push command: | docker tag my-registry/my-service:$CIRCLE_SHA1 my-registry/my-service:latest docker push my-registry/my-service:$CIRCLE_SHA1 docker push my-registry/my-service:latest # ───────────────────────────────────────────────────── # [3] QUALITY GATE - automated, 90 seconds # ───────────────────────────────────────────────────── quality-gate: docker: [{ image: cimg/node:18.20 }] steps: - checkout - attach_workspace: { at: . } - run: name: Coverage threshold (min 80%) command: | COV=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') echo "Coverage: $COV%" [ "$(echo "$COV < 80" | bc -l)" -eq 1 ] && { echo "❌ Below 80%"; exit 1; } echo "✅ Coverage gate passed" - run: name: Security scan (no HIGH/CRITICAL) command: | docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \\ aquasec/trivy:latest image \\ --exit-code 1 --severity HIGH,CRITICAL \\ my-registry/my-service:$CIRCLE_SHA1 echo "✅ Security gate passed" - run: name: Can-I-Deploy contract verification command: | npx pact-broker can-i-deploy \\ --pacticipant my-service \\ --version $CIRCLE_SHA1 \\ --to-environment production \\ --broker-base-url $PACT_BROKER_URL \\ --broker-token $PACT_BROKER_TOKEN echo "✅ Contract gate passed" - run: name: Performance baseline check command: | P99=$(./scripts/get-staging-p99.sh) [ "$(echo "$P99 > 2.0" | bc -l)" -eq 1 ] && { echo "❌ P99 ${P99}s"; exit 1; } echo "✅ Performance gate passed - P99: ${P99}s" # ───────────────────────────────────────────────────── # [4] EPHEMERAL STAGING # ───────────────────────────────────────────────────── ephemeral-staging: docker: [{ image: cimg/node:18.20 }] steps: - checkout - terraform/install - run: name: Provision ephemeral environment command: | terraform init -backend-config="key=staging-$CIRCLE_BUILD_NUM.tfstate" terraform apply -auto-approve \\ -var="env_id=build-$CIRCLE_BUILD_NUM" \\ -var="app_image=my-registry/my-service:$CIRCLE_SHA1" - run: name: E2E tests against ephemeral environment command: | npm run test:e2e -- \\ --base-url="https://build-$CIRCLE_BUILD_NUM.staging.internal" - run: name: Tear down environment (always - even on failure) when: always command: | terraform destroy -auto-approve \\ -var="env_id=build-$CIRCLE_BUILD_NUM" || true # ───────────────────────────────────────────────────── # [5] PRODUCTION DEPLOY - canary with metric‑gated rollback # ───────────────────────────────────────────────────── deploy-production: docker: [{ image: cimg/base:current }] steps: - checkout - run: name: Canary deploy (10% traffic) command: | ./scripts/canary-deploy.sh \\ --image my-registry/my-service:$CIRCLE_SHA1 \\ --weight 10 - run: name: Validate canary health command: | for i in {1..12}; do HTTP=$(curl -s -o /dev/null -w "%{http_code}" https://app.example.com/health) [ "$HTTP" == "200" ] && break [ $i -eq 12 ] && { echo "❌ Health check failed"; exit 1; } sleep 10 done ERR=$(./scripts/get-metric.sh error_rate_pct) P99=$(./scripts/get-metric.sh p99_latency_seconds) [ "$(echo "$ERR > 1.0" | bc -l)" -eq 1 ] && { echo "❌ Error rate ${ERR}%"; exit 1; } [ "$(echo "$P99 > 2.0" | bc -l)" -eq 1 ] && { echo "❌ P99 ${P99}s"; exit 1; } echo "✅ Canary healthy - promoting to 100%" - run: name: Promote to full rollout command: | ./scripts/canary-deploy.sh \\ --image my-registry/my-service:$CIRCLE_SHA1 \\ --weight 100 - run: name: Auto-rollback on validation failure when: on_fail command: | echo "❌ Validation failed. Rolling back..." ./scripts/rollback.sh --to-previous ./scripts/emit-dora-metric.sh deployment_failure $CIRCLE_SHA1 - slack/notify: event: pass template: basic_success_1 - slack/notify: event: fail template: basic_fail_1 # ───────────────────────────────────────────────────── # WORKFLOW - the dependency graph # ───────────────────────────────────────────────────── workflows: full-pipeline: jobs: # All four jobs run simultaneously (fan‑out) - test-unit - test-integration - test-contracts - build-image # Quality gate only runs after ALL four pass - quality-gate: requires: [test-unit, test-integration, test-contracts, build-image] # Staging only on main branch - ephemeral-staging: requires: [quality-gate] filters: { branches: { only: main } } # Production only after staging passes - deploy-production: requires: [ephemeral-staging] filters: { branches: { only: main } }
- Docker Layer Caching (DLC) - fastest image builds in SaaS CI
- Fan‑out workflow model makes parallelism the default, not the exception
- Service containers make integration tests genuinely real, not mocked
- Orbs ecosystem (AWS, Terraform, Slack) reduces boilerplate dramatically
- Excellent split‑testing and test parallelism across containers
- Zero infra management - no JVM, no plugins, no patching
- SaaS dependency - your pipeline is on their infrastructure
- Complex customisation hits YAML limits faster than Jenkins Groovy
- Credit system can be confusing to predict costs on variable builds
- No air‑gapped option unless running self‑hosted runners
- Less flexibility for non‑standard compute (custom hardware rigs)
CircleCI Orbs - Avoiding Boilerplate
Orbs are reusable YAML packages. The equivalent of Jenkins Shared Libraries, but shareable publicly. For teams deploying to multiple clouds or using multiple tools:
YAML - CIRCLECI: ORBS FOR AWS, TERRAFORM, SLACKCOPYversion: 2.1 # These orbs replace hundreds of lines of custom script orbs: aws-cli: circleci/aws-cli@4.1 # Auth, ECR push, ECS/EKS deploy terraform: circleci/terraform@3.2 # init, plan, apply, destroy slack: circleci/slack@4.13 # Notifications without curl spaghetti docker: circleci/docker@2.6 # Build, tag, push with DLC jobs: deploy-to-ecs: docker: [{ image: cimg/base:current }] steps: - checkout - aws-cli/setup: role_arn: arn:aws:iam::$AWS_ACCOUNT_ID:role/CircleCIDeployRole aws_region: us-east-1 - run: name: Update ECS service (no AWS YAML wrangling needed) command: | aws ecs update-service \\ --cluster my-cluster \\ --service my-service \\ --force-new-deployment - aws-cli/wait_for_ecs_service_stability: cluster: my-cluster service: my-service max_wait_seconds: 300 - slack/notify: event: always custom: | { "blocks": [{ "type": "section", "text": { "type": "mrkdwn", "text": "*Deploy result:* $CIRCLE_JOB $SLACK_OUTCOME\n*Commit:* $CIRCLE_SHA1\n*Branch:* $CIRCLE_BRANCH" } }] }
Part 7: The Real Pipeline Architecture (Tool‑Agnostic)
The two pipelines above (Jenkins and CircleCI) both implement the exact same architecture. The stages and feedback loop are what matter - not the YAML syntax or the Groovy DSL.
▼
(Jenkins · CircleCI · GitHub Actions · GitLab CI - pick your tool)
Branch: main
Webhook trigger
Docker build + push
Security scan (Trivy/Snyk)
No HIGH/CRITICAL CVEs
Contract: can‑i‑deploy
Exact production mirror
E2E tests run here
Torn down after
Metric‑gated health check
Auto‑rollback on failure
DORA metric emitted
▼
Error rate alarms → rollback triggers
Deployment frequency + lead time → feeds back INTO pipeline configuration
Alternative: GitHub Actions (for GitHub‑hosted teams)
● GitHub Actions The same architecture implemented as a GitHub Actions workflow - shown here as an alternative for teams on GitHub rather than self‑hosted Jenkins:
YAML - GITHUB ACTIONS: SAME 5‑STAGE ARCHITECTURE (ALTERNATIVE)COPY# .github/workflows/ci-cd.yml name: CI/CD Pipeline on: push: { branches: [main] } pull_request: { branches: [main] } permissions: id-token: write # OIDC - no stored cloud credentials contents: read jobs: # [2] Fan‑out test matrix - equivalent to Jenkins parallel{} or CircleCI fan‑out test: runs-on: ubuntu-latest strategy: fail-fast: false matrix: suite: [unit, integration, contracts] steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: { node-version: '18', cache: 'npm' } - run: npm ci - run: npm run test:${{ matrix.suite }} build-image: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: | docker build --cache-from my-registry/my-service:latest -t my-registry/my-service:${{ github.sha }} . docker push my-registry/my-service:${{ github.sha }} # [3] Quality gate - equivalent to Jenkins quality gate stage quality-gate: needs: [test, build-image] runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: npm ci - run: | COV=$(cat coverage/coverage-summary.json | jq '.total.lines.pct') [ "$(echo "$COV < 80" | bc -l)" -eq 1 ] && exit 1 - run: | docker run --rm -v /var/run/docker.sock:/var/run/docker.sock aquasec/trivy:latest image --exit-code 1 --severity HIGH,CRITICAL my-registry/my-service:${{ github.sha }} # [5] Production deploy deploy: needs: [quality-gate] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: | ./scripts/canary-deploy.sh --image ${{ github.sha }} --weight 10 ./scripts/health-check.sh ./scripts/canary-deploy.sh --image ${{ github.sha }} --weight 100 - if: failure() run: ./scripts/rollback.sh --to-previous
Part 8: Security Scanning in the Pipeline - Most Teams Get This Wrong
Security is the most neglected dimension of CI/CD. Most teams bolt on a vulnerability scanner as an afterthought - then ignore its output because it generates too many false positives. A real security pipeline treats security as a first‑class quality gate.
| Scan Type | Target Area | Top Tools | Pipeline Stage | Industry Adoption |
|---|---|---|---|---|
| Secret Detection | Hardcoded creds in code | GitLeaks, TruffleHog | Pre‑commit + CI | ~30% of teams |
| SAST (Static) | Source code patterns | Semgrep, SonarQube | Every commit | ~15% of teams |
| SCA (Dependencies) | Known CVEs in packages | Snyk, npm audit, Trivy fs | Every build | ~40% of teams |
| Container Scanning | OS + app‑layer CVEs in images | Trivy, Grype | Every image build | ~35% of teams |
| IaC Scanning | Misconfigs in Terraform/Ansible | Checkov, tfsec | Every commit | ~12% of teams |
| DAST (Dynamic) | Running app vulnerabilities | OWASP ZAP, Nuclei | Post‑deploy to staging | ~10% of teams |
● Jenkins Full 5‑layer security pipeline as a Jenkinsfile stage:
GROOVY - JENKINSFILE: FULL 5‑LAYER SECURITY SCANCOPYstage('Security Scan') { parallel { stage('Secret Detection') { steps { sh 'trufflehog filesystem . --fail --no-update' echo "✅ No secrets in code" } } stage('SAST') { steps { sh 'npx semgrep scan --config=auto --error --severity=ERROR .' echo "✅ SAST clean" } } stage('Dependencies') { steps { sh 'npm audit --audit-level=high' sh 'trivy fs --severity HIGH,CRITICAL --exit-code 1 .' echo "✅ Dependencies clean" } } stage('Container') { steps { sh """ trivy image --severity HIGH,CRITICAL --exit-code 1 \\ my-registry/my-service:${GIT_COMMIT} """ echo "✅ Container image clean" } } stage('IaC') { steps { sh 'checkov -d ./terraform --quiet --compact' echo "✅ IaC scan clean" } } } }
● CircleCI Same scan as parallel CircleCI jobs (they all run simultaneously):
YAML - CIRCLECI: PARALLEL SECURITY SCAN JOBSCOPYjobs: scan-secrets: docker: [{ image: cimg/base:current }] steps: - checkout - run: command: | curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh | sh trufflehog filesystem . --fail --no-update scan-sast: docker: [{ image: returntocorp/semgrep }] steps: - checkout - run: semgrep scan --config=auto --error --severity=ERROR . scan-dependencies: docker: [{ image: cimg/node:18.20 }] steps: - checkout - run: npm ci - run: npm audit --audit-level=high - run: | curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh trivy fs --severity HIGH,CRITICAL --exit-code 1 . scan-container: machine: [{ image: ubuntu-2204:current }] steps: - run: | docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \\ aquasec/trivy:latest image \\ --exit-code 1 --severity HIGH,CRITICAL \\ my-registry/my-service:$CIRCLE_SHA1 workflows: security: jobs: # All four run simultaneously - whole security scan in ~2 minutes - scan-secrets - scan-sast - scan-dependencies - scan-container
gitleaks as a pre‑commit hook on every developer machine. It catches secrets before the first push.BASH - PRE‑COMMIT HOOK (works regardless of CI tool)COPY#!/bin/bash # .git/hooks/pre-commit # Or manage team‑wide with: https://pre-commit.com echo "🔐 Checking staged files for secrets..." gitleaks protect --staged --no-banner --exit-code 1 if [ $? -ne 0 ]; then echo "" echo "❌ BLOCKED: Potential secret in staged files." echo " Remove it, then commit again." echo " False positive? Use: git commit --no-verify" exit 1 fi echo "✅ No secrets found."
Part 9: The Hidden Cost Nobody Talks About
(True - includes labour)
(25 services, optimised)
(25 services)
(self‑hosted agents)
REAL COST BREAKDOWN (25 SERVICES, 50 BUILDS/DAY, 15‑MIN AVG BUILD)COPYJenkins (self‑hosted, "free"): EC2 m5.xlarge × 2 (controller + agents): $560/month EBS storage: $80/month Engineer maintenance @ $80/hr × 4 hrs/week: ~$1,280/month Plugin updates, security patches, JVM tuning: ~$1,920/month (est.) Total Jenkins: ~$3,840+/month + ZERO elasticity CircleCI: 50 builds/day × 15 min × $0.006/credit × 30 days = $135/svc × 25 services = $3,375/month (before volume discounts) Zero maintenance engineering time Docker Layer Caching cuts build time → reduces cost further GitHub Actions: 50 builds/day × 15 min × $0.008/min × 30 days = $180/svc × 25 services = $4,500/month AWS CodeBuild (alternative): 50 builds/day × 15 min × $0.005/min × 30 days = $112.50/svc × 25 services = $2,812/month Best per‑minute cost - but you need the AWS ecosystem for it to make sense Buildkite (hybrid): $15/seat × 10 devs = $150/month Self‑hosted agents (2× m5.large): ~$375/month Agent maintenance: ~$800/month Total: ~$1,325/month - cheapest if you're willing to run agents
| CI Platform | Base Unit Cost | Est. Monthly (25 Svcs) | Est. Annual Cost | Hidden Maintenance | Value Score | Source Link |
|---|---|---|---|---|---|---|
| CircleCI | $0.006/credit | $3,375 | $40,500 | None (SaaS) | ★★★★ | circleci.com/pricing |
| Jenkins | $0 (EC2 amortised) | $3,840+ | $46,080+ | ~$3,200/mo labour | ★★ | EC2 + labour @ $80/hr |
| GitHub Actions | $0.008/min | $4,500 | $54,000 | None (SaaS) | ★★★★ | github.com/pricing |
| GitLab CI | ~$0.10/build | $4,040 | $48,480 | $290/mo seats | ★★★★ | gitlab.com/pricing |
| AWS CodeBuild | $0.005/min | $2,812 | $33,750 | None | ★★★ | aws.amazon.com |
| Buildkite | ~$0.05/build | $1,325 | $15,900 | ~$800/mo agents | ★★★ | buildkite.com/pricing |
| Drone CI | $0 (open source) | $800–1,500 | $9.6K–18K | Server + maintenance | ★★★ | drone.io (OSS) |
Part 10: GitOps with ArgoCD - The Kubernetes Path
For teams running on Kubernetes, the pipeline architecture shifts significantly. Instead of push‑based deployments, GitOps uses a pull‑based model where the cluster watches a Git repo and automatically reconciles its state.
YAML - CI PIPELINE HANDS OFF TO ARGOCD VIA GIT COMMITCOPY# CircleCI - final step of deploy job: update image tag in GitOps repo jobs: deploy-production: docker: [{ image: cimg/base:current }] steps: - run: name: Update image tag in GitOps repo (triggers ArgoCD sync) command: | git clone https://github.com/my-org/k8s-manifests.git cd k8s-manifests # Update the image tag using kustomize or yq yq e ".spec.template.spec.containers[0].image = \"my-registry/my-service:$CIRCLE_SHA1\"" \\ -i overlays/production/deployment.yaml git config user.email "ci@example.com" git config user.name "CircleCI Bot" git commit -am "chore: deploy my-service $CIRCLE_SHA1" git push # ArgoCD detects the commit and syncs the cluster - GitOps pull model
YAML - ARGOCD APPLICATION + ARGO ROLLOUTS CANARYCOPYapiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: my-service namespace: argocd spec: source: repoURL: https://github.com/my-org/k8s-manifests.git targetRevision: main path: overlays/production destination: server: https://kubernetes.default.svc namespace: production syncPolicy: automated: prune: true # Remove resources not in Git selfHeal: true # Auto‑correct drift retry: limit: 3 backoff: { duration: 5s, factor: 2, maxDuration: 3m0s } --- apiVersion: argoproj.io/v1alpha1 kind: Rollout metadata: name: my-service spec: replicas: 5 strategy: canary: steps: - setWeight: 10 - pause: { duration: 2m } - analysis: templates: [{ templateName: success-rate }] - setWeight: 50 - pause: { duration: 5m } - setWeight: 100 rollbackWindow: { revisions: 2 }
Part 11: Terraform vs Ansible for Pipeline Infrastructure
Your pipeline infrastructure itself should be code. Here's the tool landscape for provisioning that infrastructure - independent of which CI tool you run on top of it.
| IaC Platform | Language | Multi-Cloud | Key Strengths | Best For |
|---|---|---|---|---|
| Terraform / OpenTofu | HCL | Yes ★★ | Largest provider ecosystem, state mgmt, drift detection, plan preview | Multi‑cloud / any team |
| Ansible | YAML + Python | Yes ★ | Agentless, config mgmt + deploy steps, idempotent | VM‑heavy, hybrid on‑prem |
| Pulumi | TS / Python / Go | Yes ★ | Real programming languages, multi‑cloud | Teams preferring code over HCL |
| AWS CDK | TypeScript / Python | AWS only | Type safety, L2 constructs, IDE autocomplete | AWS‑native teams already on CDK |
| Crossplane | YAML (CRDs) | Yes ★ | K8s‑native IaC, self‑healing infra | K8s platform teams |
HCL - TERRAFORM: JENKINS AGENT POOL INFRASTRUCTURECOPY# main.tf - Jenkins agent pool on AWS (or adjust for any cloud) resource "aws_autoscaling_group" "jenkins_agents" { name = "jenkins-agent-pool" min_size = 1 max_size = 10 desired_capacity = 2 launch_template { id = aws_launch_template.jenkins_agent.id version = "$Latest" } # Scale up when build queue > 3 tag { key = "Jenkins" value = "agent" propagate_at_launch = true } } resource "aws_autoscaling_policy" "scale_up" { name = "jenkins-agent-scale-up" autoscaling_group_name = aws_autoscaling_group.jenkins_agents.name policy_type = "TargetTrackingScaling" target_tracking_configuration { customized_metric_specification { metric_name = "JenkinsBuildQueueDepth" namespace = "Custom/Jenkins" statistic = "Average" } target_value = 3.0 # Scale up if queue > 3 builds } } # Ephemeral staging environment (called per build) resource "aws_instance" "staging" { count = var.create_staging ? 1 : 0 ami = data.aws_ami.app.id instance_type = var.instance_type # Same as production tags = { Environment = "staging-${var.env_id}" AutoTeardown = "true" } } output "staging_url" { value = var.create_staging ? "https://staging-${var.env_id}.internal" : "" }
YAML - ANSIBLE: DEPLOY + CONFIG MANAGEMENT (VM TEAMS)COPY# deploy.yml - Ansible playbook for blue/green deploy # Called from Jenkins: sh 'ansible-playbook deploy.yml -e "image_tag=${GIT_COMMIT}"' # Or from CircleCI: run: ansible-playbook deploy.yml -e "image_tag=$CIRCLE_SHA1" - name: Blue/Green Deploy hosts: production become: yes vars: image_tag: "{{ image_tag }}" registry: my-registry service: my-service tasks: - name: Pull new image community.docker.docker_image: name: "{{ registry }}/{{ service }}:{{ image_tag }}" source: pull - name: Start green container community.docker.docker_container: name: "{{ service }}-green" image: "{{ registry }}/{{ service }}:{{ image_tag }}" ports: ["8081:8080"] state: started restart_policy: unless-stopped - name: Health check green container uri: url: http://localhost:8081/health status_code: 200 retries: 10 delay: 5 register: health_result - name: Switch load balancer to green (nginx) template: src: nginx-green.conf.j2 dest: /etc/nginx/conf.d/service.conf notify: reload nginx when: health_result.status == 200 - name: Remove old blue container community.docker.docker_container: name: "{{ service }}-blue" state: absent when: health_result.status == 200 handlers: - name: reload nginx service: { name: nginx, state: reloaded }
Part 12: Observability - Closing the Feedback Loop
The feedback loop is what separates a real CI/CD pipeline from a deployment conveyor belt. Without production metrics flowing back into the pipeline, you have no way to know if deployments are actually working.
BASH - EMIT DORA METRICS AFTER EVERY DEPLOY (ANY CI TOOL)COPY#!/bin/bash # scripts/emit-dora-metric.sh # Called from Jenkinsfile post{} block OR CircleCI on_fail/on_success step # Works with Datadog, Prometheus pushgateway, Grafana, or any metrics backend set -e EVENT="$1" # "deployment_success" | "deployment_failure" COMMIT="$2" # commit SHA SERVICE="${3:-my-service}" DEPLOY_END=$(date +%s) DEPLOY_START="${DEPLOY_START_EPOCH:-$DEPLOY_END}" # Set at pipeline start LEAD_TIME=$(( DEPLOY_END - DEPLOY_START )) echo "📊 Emitting DORA metrics..." echo " Service: $SERVICE" echo " Event: $EVENT" echo " Commit: $COMMIT" echo " Lead time: ${LEAD_TIME}s" # ── Option A: Datadog ── if [ -n "$DD_API_KEY" ]; then curl -s -X POST "https://api.datadoghq.com/api/v1/events" \\ -H "Content-Type: application/json" \\ -H "DD-API-KEY: $DD_API_KEY" \\ -d "{ \"title\": \"Deployment: $SERVICE\", \"text\": \"Commit $COMMIT - $EVENT\", \"tags\": [\"service:$SERVICE\",\"event:$EVENT\",\"dora:deployment\"], \"aggregation_key\": \"$SERVICE-deploy\" }" fi # ── Option B: Prometheus Pushgateway ── if [ -n "$PROMETHEUS_PUSHGW" ]; then cat <<EOF | curl -s --data-binary @- "$PROMETHEUS_PUSHGW/metrics/job/cicd/service/$SERVICE" # HELP dora_deployment_lead_time_seconds Lead time from commit to production # TYPE dora_deployment_lead_time_seconds gauge dora_deployment_lead_time_seconds{service="$SERVICE",status="$EVENT"} $LEAD_TIME # HELP dora_deployment_total Total deployments # TYPE dora_deployment_total counter dora_deployment_total{service="$SERVICE",status="$EVENT"} 1 EOF fi # ── Option C: JSON to any webhook / Grafana Loki ── if [ -n "$METRICS_WEBHOOK" ]; then curl -s -X POST "$METRICS_WEBHOOK" \\ -H "Content-Type: application/json" \\ -d "{ \"service\": \"$SERVICE\", \"event\": \"$EVENT\", \"commit\": \"$COMMIT\", \"lead_time\": $LEAD_TIME, \"timestamp\": $DEPLOY_END }" fi echo "✅ DORA metrics emitted"
Part 13: The 30‑Minute Pipeline Audit
Here is the exact audit I run on every pipeline I review. These commands work regardless of which CI tool you use - they query your deployment system, version control, and metrics directly.
BASH - DORA AUDIT SCRIPT (TOOL‑AGNOSTIC)COPY#!/bin/bash # dora-audit.sh - Run this today. Works with any CI tool. # Adjust DEPLOY_LOG_CMD to match your deployment mechanism. echo "═══════════════════════════════════════" echo " DORA 30‑MINUTE PIPELINE AUDIT" echo "═══════════════════════════════════════" # ── Q1: DEPLOYMENT FREQUENCY ── echo "" echo "Q1: How many times did you deploy to production in the last 7 days?" echo " Check your deployment log, Slack #deploys, or your CD tool:" echo "" echo " Jenkins → Jenkins build history for your deploy job:" echo " curl -s http://jenkins:8080/job/deploy-prod/api/json?tree=builds[timestamp,result]" echo "" echo " CircleCI → API:" echo " curl 'https://circleci.com/api/v2/project/gh/org/repo/pipeline?branch=main' \\" echo " -H 'Circle-Token: $CIRCLE_TOKEN' | jq '.items | length'" echo "" echo " Elite target: 7+ deploys/week (1+ per day)" # ── Q2: LEAD TIME ── echo "" echo "Q2: How long from 'git push' to 'live in production'?" echo " Measure this NOW - pick your last 3 merges to main and time them." echo " git log --merges -n 5 --pretty='%H %ci %s'" echo "" git log --merges -n 5 --pretty=" %h | %ci | %s" 2>/dev/null || echo " (run inside your repo)" echo "" echo " Elite target: under 1 hour commit‑to‑production" # ── Q3: CHANGE FAILURE RATE ── echo "" echo "Q3: Of your last 20 deploys, how many required rollback or hotfix?" echo " Check your Slack #deploys channel, PagerDuty, or on‑call log." echo "" echo " Simple shell count from Jenkins log:" echo " curl -s http://jenkins:8080/job/deploy-prod/api/json \\" echo " | jq '[.builds[] | select(.result==\"FAILURE\")] | length'" echo "" echo " Elite target: 0–15% failure rate" # ── Q4: MTTR ── echo "" echo "Q4: Last time production broke - how long to fix + redeploy?" echo " Check your incident log / PagerDuty / Slack thread timestamps." echo " Formula: (resolution timestamp) - (first alert timestamp)" echo "" echo " Elite target: under 1 hour from incident to recovery" # ── Q5: THE 5 LIES CHECKLIST ── echo "" echo "Q5: Honestly answer these 5 questions (score 0–2 each):" echo "" echo " [Test Confidence]" echo " 0 = Green badge but tests don't catch real failures" echo " 1 = Mix of meaningful tests and noise" echo " 2 = Tests actually catch regressions before prod" echo "" echo " [Staging Fidelity]" echo " 0 = Static staging env, months out of date" echo " 1 = Mostly similar to prod, some drift" echo " 2 = Ephemeral, IaC‑provisioned, exact prod mirror" echo "" echo " [Rollback]" echo " 0 = Script exists but has never been run" echo " 1 = Manual, sometimes works, untested" echo " 2 = Metric‑triggered, automatic, drilled monthly" echo "" echo " [Lead Time]" echo " 0 = Deploy windows, multiple days" echo " 1 = Hours, some manual gates" echo " 2 = Under 1 hour, automated quality gate" echo "" echo " [Feedback Loop]" echo " 0 = No metrics from prod flow back to pipeline" echo " 1 = Some monitoring, not connected to pipeline" echo " 2 = DORA metrics visible, rollback auto‑triggered" echo "" echo "═══════════════════════════════════════" echo " Score: 0–4 = Automated Deployments (not CI/CD)" echo " 5–8 = Partial CI/CD (fix lowest score first)" echo " 9–11 = Good CI/CD (focus on feedback loop)" echo " 12–14 = Elite (keep it as you scale)" echo "═══════════════════════════════════════"
The Lying Pipeline Scorecard
| Pipeline Dimension | 0 Points (Lie) | 1 Point (Partial) | 2 Points (True CI/CD) |
|---|---|---|---|
| Test Confidence | Green badge, no trust | Some meaningful tests | Tests catch real regressions |
| Staging Fidelity | Static museum, months old | Mostly similar | Ephemeral, IaC‑provisioned per run |
| Rollback | Untested script | Manual, sometimes works | Metric‑triggered, drilled monthly |
| Lead Time | Days to weeks | Hours | Under 1 hour |
| Approval Gates | Multiple manual | One manual | Zero (automated quality gate) |
| Feedback Loop | No prod metrics | Some monitoring | Metrics feed back into pipeline |
| Deploy Confidence | "No deploys on Friday" | Occasional Friday nerves | Deploy any time, any day - safety nets in place |
Part 14: The Verdict - Which Stack Should You Actually Use?
After 14 sections and 25+ code examples - here is the honest recommendation based on your actual situation.
| Team Scenario | CI Recommendation | CD Recommendation | IaC Recommendation | Migration Strategy |
|---|---|---|---|---|
| Already on Jenkins | Jenkins (CI) - stay put | Jenkins deploy jobs or ArgoCD | Terraform | Migrate CI last, it's already working - fix practices first |
| Docker‑first team, SaaS preferred | CircleCI | CircleCI deploy + ArgoCD | Terraform | Best Docker DLC, fan‑out model, zero infra ops |
| Kubernetes (EKS/GKE/AKS) | CircleCI or GitHub Actions | ArgoCD + Argo Rollouts | Terraform or Pulumi | GitOps is the natural K8s CD pattern |
| Multi‑cloud team | CircleCI or GitHub Actions | Spinnaker or ArgoCD | Terraform / OpenTofu | Terraform is the only truly multi‑cloud IaC |
| Security/compliance‑first (SOC2, HIPAA) | GitLab CI (built‑in SAST/DAST) | Jenkins or ArgoCD | Terraform | GitLab's integrated DevSecOps suite eliminates plugin sprawl |
| VM‑heavy, on‑prem or hybrid | Jenkins | Jenkins + Ansible | Terraform + Ansible | Jenkins + Ansible is the most battle‑tested VM deploy stack |
| Startup, <10 devs, speed‑first | CircleCI (free tier) or GitHub Actions | CircleCI deploy job | Terraform | Zero infra, fast to set up, free tiers cover most small teams |
| Large enterprise (100+ devs) | Buildkite or Jenkins | ArgoCD + Jenkins (hybrid) | Terraform (at scale) | Buildkite or Jenkins handles complex multi‑team workflows |
| AWS‑native, no K8s | CircleCI or Jenkins | AWS CodeDeploy | AWS CDK or Terraform | CodeDeploy's native rollback is excellent for EC2/ECS/Lambda |
Part 15: The Hard Truth and Your 4‑Week Fix Plan
The pipeline is almost never the problem. The pipeline is a mirror. It reflects the practices, the culture, and the engineering discipline of the team that built it.
Week 2 - Test Confidence: Pick the service with the highest change failure rate. Add contract tests (Pact) to its Jenkins build stage or CircleCI job. Replace one manual approval step with an automated quality gate (coverage + security).
Week 3 - Staging Fidelity: Convert your staging environment to ephemeral Terraform or Ansible stacks. Wire it into your Jenkins pipeline
post { always {} } or CircleCI's when: always. Run integration tests against a fresh environment each build, then tear it down.Week 4 - Rollback Confidence: Add metric‑driven rollback logic to your Jenkins deploy stage or CircleCI
when: on_fail step. Run a rollback drill. Deliberately. In business hours. On a non‑critical service. Time it. Write it down. Do it again next month.Then start again. Because CI/CD is not a destination. It's a practice.
Quick Reference: Lying Pipeline vs Real Pipeline
| Pipeline Dimension | Automated Deployments (Lie) | True CI/CD (Reality) |
|---|---|---|
| Tests | 87% coverage testing constructors and mocks | Unit + integration + contract + performance tests |
| Staging | Static museum, months out of date | Ephemeral, IaC‑provisioned per run, exact prod mirror |
| Rollback | Untested script from 8 months ago | Metric‑triggered, <5 min, drilled monthly |
| Speed | 28+ hours (97% waiting) | <1 hour commit‑to‑production, parallelised builds |
| Approvals | 2.3 hour manual gate, 5.5 FTE/week waste | 90‑second automated quality gate |
| Security | npm audit once (if lucky) | 5‑layer scan: secrets + SAST + SCA + container + IaC |
| Feedback | Deploy goes out, nothing comes back | DORA metrics + error rates feed back into pipeline |
| Deploy Confidence | "No deploys on Friday" | Deploy any time, any day - safety nets in place |
"CI/CD is not a tool you install. It's a discipline you practice. Whether you're on Jenkins, CircleCI, GitHub Actions, or anything else - the pipeline is not the problem. The understanding of what CI/CD is supposed to do is the problem."
Run the audit script above on your pipeline this week. If more than 2 answers make you uncomfortable - you know exactly what to fix first.
What's the biggest lie your pipeline is telling you right now? Let me know in the comments. 👇
Verified Sources & References
If this deep‑dive helped you make a clearer decision about your CI/CD architecture, I'd love to hear which tools you're using - and which ones surprised you. If you notice any data that has changed or corrections needed, please let me know in the comments below - this article is a living document and I update it with verified corrections. 👇