Building a Production-Ready CI/CD Pipeline with GitHub Actions

Why CI/CD is your first infrastructure investment

If your team is still deploying by SSH-ing into a server and running git pull, you're burning engineering hours and taking unnecessary risk with every release. A proper CI/CD pipeline turns deployments from a nerve-wracking ritual into a non-event — and that's exactly how it should be.

GitHub Actions is our default recommendation for startups already on GitHub. It's built into your existing workflow, has excellent ecosystem support, and the free tier is generous enough for most early-stage teams. No separate CI server to maintain.

This guide builds a complete pipeline: test on every PR, build Docker images, deploy to staging automatically, deploy to production with approval, and notify your team when something goes wrong.

Project structure

Before we dive into the workflows, here's the directory structure we'll use:

bash

.github/
  workflows/
    ci.yml              # Tests on every PR
    build-and-deploy.yml # Build, push, deploy on merge to main

And a Dockerfile for the application:

dockerfile

# Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

COPY . .
RUN npm run build

FROM node:20-alpine AS runner

WORKDIR /app

RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 appuser

COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/package.json ./

USER appuser

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["node", "dist/server.js"]

This is a multi-stage build that keeps the final image small and runs as a non-root user. The health check gives your orchestrator (ECS, Kubernetes, etc.) a way to detect unhealthy containers.

Running tests on every pull request

This is the foundation. Every PR triggers a test run, and merging is blocked until tests pass.

yaml

# .github/workflows/ci.yml
name: CI

on:
  pull_request:
    branches: [main, develop]

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint-and-typecheck:
    name: Lint & Type Check
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"

      - run: npm ci

      - name: Run ESLint
        run: npm run lint

      - name: Run type check
        run: npx tsc --noEmit

  test:
    name: Test
    runs-on: ubuntu-latest
    needs: lint-and-typecheck

    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
          POSTGRES_DB: app_test
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: "npm"

      - run: npm ci

      - name: Run database migrations
        run: npm run db:migrate
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/app_test

      - name: Run tests
        run: npm test -- --coverage
        env:
          DATABASE_URL: postgresql://test:test@localhost:5432/app_test
          REDIS_URL: redis://localhost:6379
          NODE_ENV: test

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          token: ${{ secrets.CODECOV_TOKEN }}
          fail_ci_if_error: false

A few things worth highlighting. The concurrency block cancels in-progress runs when you push new commits to the same PR — no point testing stale code. The services block spins up real PostgreSQL and Redis instances for integration tests, so you're testing against the same databases you use in production. And the lint/typecheck job runs first as a fast-fail gate — no point running a 5-minute test suite if there's a type error.

Build, push, and deploy on merge

When code lands on main, this workflow builds a Docker image, pushes it to ECR, and deploys to staging automatically. Production deploys require manual approval.

yaml

# .github/workflows/build-and-deploy.yml
name: Build & Deploy

on:
  push:
    branches: [main]

env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: myapp
  ECS_CLUSTER: myapp-production
  ECS_SERVICE_STAGING: myapp-staging-app
  ECS_SERVICE_PRODUCTION: myapp-production-app

permissions:
  id-token: write
  contents: read

jobs:
  build:
    name: Build & Push Image
    runs-on: ubuntu-latest
    outputs:
      image_tag: ${{ steps.meta.outputs.tags }}

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Login to Amazon ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v2

      - name: Extract metadata
        id: meta
        run: |
          SHA=${{ github.sha }}
          SHORT_SHA=${SHA::7}
          TIMESTAMP=$(date +%Y%m%d%H%M%S)
          IMAGE_TAG="${TIMESTAMP}-${SHORT_SHA}"
          FULL_IMAGE="${{ steps.ecr-login.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${IMAGE_TAG}"
          echo "tags=${FULL_IMAGE}" >> $GITHUB_OUTPUT
          echo "image_tag=${IMAGE_TAG}" >> $GITHUB_OUTPUT

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ${{ steps.meta.outputs.tags }}
            ${{ steps.ecr-login.outputs.registry }}/${{ env.ECR_REPOSITORY }}:latest

      - name: Scan image for vulnerabilities
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ steps.meta.outputs.tags }}
          format: "table"
          exit-code: "1"
          ignore-unfixed: true
          severity: "CRITICAL,HIGH"

  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build
    environment: staging

    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Deploy to ECS Staging
        run: |
          aws ecs update-service \
            --cluster ${{ env.ECS_CLUSTER }} \
            --service ${{ env.ECS_SERVICE_STAGING }} \
            --force-new-deployment

      - name: Wait for deployment stability
        run: |
          aws ecs wait services-stable \
            --cluster ${{ env.ECS_CLUSTER }} \
            --services ${{ env.ECS_SERVICE_STAGING }}

      - name: Run smoke tests against staging
        run: |
          for i in {1..5}; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.yourapp.com/health)
            if [ "$STATUS" = "200" ]; then
              echo "Staging health check passed"
              exit 0
            fi
            echo "Attempt $i: got status $STATUS, retrying..."
            sleep 10
          done
          echo "Staging health check failed after 5 attempts"
          exit 1

  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: deploy-staging
    environment: production

    steps:
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Deploy to ECS Production
        run: |
          aws ecs update-service \
            --cluster ${{ env.ECS_CLUSTER }} \
            --service ${{ env.ECS_SERVICE_PRODUCTION }} \
            --force-new-deployment

      - name: Wait for deployment stability
        run: |
          aws ecs wait services-stable \
            --cluster ${{ env.ECS_CLUSTER }} \
            --services ${{ env.ECS_SERVICE_PRODUCTION }}

      - name: Verify production health
        run: |
          for i in {1..10}; do
            STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourapp.com/health)
            if [ "$STATUS" = "200" ]; then
              echo "Production health check passed"
              exit 0
            fi
            echo "Attempt $i: got status $STATUS, retrying..."
            sleep 15
          done
          echo "Production health check failed after 10 attempts"
          exit 1

  notify-failure:
    name: Notify on Failure
    runs-on: ubuntu-latest
    needs: [build, deploy-staging, deploy-production]
    if: failure()

    steps:
      - name: Send Slack notification
        uses: slackapi/slack-github-action@v1.26.0
        with:
          payload: |
            {
              "blocks": [
                {
                  "type": "header",
                  "text": {
                    "type": "plain_text",
                    "text": "Deployment Failed",
                    "emoji": true
                  }
                },
                {
                  "type": "section",
                  "fields": [
                    {
                      "type": "mrkdwn",
                      "text": "*Repository:*\n${{ github.repository }}"
                    },
                    {
                      "type": "mrkdwn",
                      "text": "*Branch:*\n${{ github.ref_name }}"
                    },
                    {
                      "type": "mrkdwn",
                      "text": "*Commit:*\n${{ github.sha }}"
                    },
                    {
                      "type": "mrkdwn",
                      "text": "*Author:*\n${{ github.actor }}"
                    }
                  ]
                },
                {
                  "type": "actions",
                  "elements": [
                    {
                      "type": "button",
                      "text": {
                        "type": "plain_text",
                        "text": "View Run"
                      },
                      "url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
                    }
                  ]
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
          SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK

This pipeline enforces a strict promotion path: code goes to staging first, smoke tests verify it's healthy, and only then can it be promoted to production. The environment: production setting in GitHub lets you require manual approval before the production deploy job runs — configure this in your repository settings under Environments.

The Trivy vulnerability scan catches known CVEs in your Docker image before it ever reaches a running environment. And the failure notification job runs whenever any upstream job fails, so your team knows immediately when a deployment breaks.

Setting up GitHub environments

For the approval gates and environment-specific secrets to work, configure GitHub Environments:

bash

# You can configure environments via the GitHub UI:
# Repository Settings > Environments > New Environment

# Or use the GitHub CLI:
gh api repos/{owner}/{repo}/environments/staging --method PUT

gh api repos/{owner}/{repo}/environments/production --method PUT \
  --field 'reviewers=[{"type":"User","id":YOUR_USER_ID}]' \
  --field 'deployment_branch_policy={"protected_branches":true,"custom_branch_policies":false}'

Add your secrets to each environment:

Both environments: AWS_ROLE_ARN
Repository-level: SLACK_WEBHOOK_URL, CODECOV_TOKEN

Adding a deployment status badge

Add a deployment status badge to your README so the team always knows the state of main:

bash

# Add this to your README.md:
# ![Build & Deploy](https://github.com/your-org/your-repo/actions/workflows/build-and-deploy.yml/badge.svg)

Optimizing build times

As your project grows, build times creep up. Here are three quick wins:

Docker layer caching — add cache-from and cache-to to the build-push-action to reuse layers between builds
Dependency caching — the setup-node action's cache: "npm" option avoids re-downloading packages on every run
Parallel jobs — lint, typecheck, and test can run in parallel (we run lint first as a fast-fail, but you can restructure if your tests are fast)

The bottom line

A production-ready CI/CD pipeline isn't something you set up once and forget. But the initial investment — a few hours of configuration — pays off on literally every deployment for the rest of your project's life. Start with the CI workflow on your next pull request, add the deployment pipeline when you're ready, and iterate from there.

The goal isn't a perfect pipeline on day one. The goal is to never manually deploy again.

Building a Production-Ready CI/CD Pipeline with GitHub Actions

Why CI/CD is your first infrastructure investment

Project structure

Running tests on every pull request

Build, push, and deploy on merge

Setting up GitHub environments

Adding a deployment status badge

Optimizing build times

The bottom line

Need help implementing this?

Get engineering insights delivered

More articles

Building for Scale: A Startup CTO's Technology Playbook

7 Infrastructure Mistakes Every Startup Makes (And How to Fix Them)