Why CI/CD is your first infrastructure investment
If your team is still deploying by SSH-ing into a server and running git pull, you're burning engineering hours and taking unnecessary risk with every release. A proper CI/CD pipeline turns deployments from a nerve-wracking ritual into a non-event — and that's exactly how it should be.
GitHub Actions is our default recommendation for startups already on GitHub. It's built into your existing workflow, has excellent ecosystem support, and the free tier is generous enough for most early-stage teams. No separate CI server to maintain.
This guide builds a complete pipeline: test on every PR, build Docker images, deploy to staging automatically, deploy to production with approval, and notify your team when something goes wrong.
Project structure
Before we dive into the workflows, here's the directory structure we'll use:
.github/
workflows/
ci.yml # Tests on every PR
build-and-deploy.yml # Build, push, deploy on merge to main
And a Dockerfile for the application:
# Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
COPY . .
RUN npm run build
FROM node:20-alpine AS runner
WORKDIR /app
RUN addgroup --system --gid 1001 nodejs && \
adduser --system --uid 1001 appuser
COPY --from=builder --chown=appuser:nodejs /app/dist ./dist
COPY --from=builder --chown=appuser:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:nodejs /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
This is a multi-stage build that keeps the final image small and runs as a non-root user. The health check gives your orchestrator (ECS, Kubernetes, etc.) a way to detect unhealthy containers.
Running tests on every pull request
This is the foundation. Every PR triggers a test run, and merging is blocked until tests pass.
# .github/workflows/ci.yml
name: CI
on:
pull_request:
branches: [main, develop]
concurrency:
group: ci-${{ github.ref }}
cancel-in-progress: true
jobs:
lint-and-typecheck:
name: Lint & Type Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
- run: npm ci
- name: Run ESLint
run: npm run lint
- name: Run type check
run: npx tsc --noEmit
test:
name: Test
runs-on: ubuntu-latest
needs: lint-and-typecheck
services:
postgres:
image: postgres:16
env:
POSTGRES_USER: test
POSTGRES_PASSWORD: test
POSTGRES_DB: app_test
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7-alpine
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: "npm"
- run: npm ci
- name: Run database migrations
run: npm run db:migrate
env:
DATABASE_URL: postgresql://test:test@localhost:5432/app_test
- name: Run tests
run: npm test -- --coverage
env:
DATABASE_URL: postgresql://test:test@localhost:5432/app_test
REDIS_URL: redis://localhost:6379
NODE_ENV: test
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
A few things worth highlighting. The concurrency block cancels in-progress runs when you push new commits to the same PR — no point testing stale code. The services block spins up real PostgreSQL and Redis instances for integration tests, so you're testing against the same databases you use in production. And the lint/typecheck job runs first as a fast-fail gate — no point running a 5-minute test suite if there's a type error.
Build, push, and deploy on merge
When code lands on main, this workflow builds a Docker image, pushes it to ECR, and deploys to staging automatically. Production deploys require manual approval.
# .github/workflows/build-and-deploy.yml
name: Build & Deploy
on:
push:
branches: [main]
env:
AWS_REGION: us-east-1
ECR_REPOSITORY: myapp
ECS_CLUSTER: myapp-production
ECS_SERVICE_STAGING: myapp-staging-app
ECS_SERVICE_PRODUCTION: myapp-production-app
permissions:
id-token: write
contents: read
jobs:
build:
name: Build & Push Image
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.meta.outputs.tags }}
steps:
- uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to Amazon ECR
id: ecr-login
uses: aws-actions/amazon-ecr-login@v2
- name: Extract metadata
id: meta
run: |
SHA=${{ github.sha }}
SHORT_SHA=${SHA::7}
TIMESTAMP=$(date +%Y%m%d%H%M%S)
IMAGE_TAG="${TIMESTAMP}-${SHORT_SHA}"
FULL_IMAGE="${{ steps.ecr-login.outputs.registry }}/${{ env.ECR_REPOSITORY }}:${IMAGE_TAG}"
echo "tags=${FULL_IMAGE}" >> $GITHUB_OUTPUT
echo "image_tag=${IMAGE_TAG}" >> $GITHUB_OUTPUT
- name: Build and push Docker image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
${{ steps.meta.outputs.tags }}
${{ steps.ecr-login.outputs.registry }}/${{ env.ECR_REPOSITORY }}:latest
- name: Scan image for vulnerabilities
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ steps.meta.outputs.tags }}
format: "table"
exit-code: "1"
ignore-unfixed: true
severity: "CRITICAL,HIGH"
deploy-staging:
name: Deploy to Staging
runs-on: ubuntu-latest
needs: build
environment: staging
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Deploy to ECS Staging
run: |
aws ecs update-service \
--cluster ${{ env.ECS_CLUSTER }} \
--service ${{ env.ECS_SERVICE_STAGING }} \
--force-new-deployment
- name: Wait for deployment stability
run: |
aws ecs wait services-stable \
--cluster ${{ env.ECS_CLUSTER }} \
--services ${{ env.ECS_SERVICE_STAGING }}
- name: Run smoke tests against staging
run: |
for i in {1..5}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://staging.yourapp.com/health)
if [ "$STATUS" = "200" ]; then
echo "Staging health check passed"
exit 0
fi
echo "Attempt $i: got status $STATUS, retrying..."
sleep 10
done
echo "Staging health check failed after 5 attempts"
exit 1
deploy-production:
name: Deploy to Production
runs-on: ubuntu-latest
needs: deploy-staging
environment: production
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Deploy to ECS Production
run: |
aws ecs update-service \
--cluster ${{ env.ECS_CLUSTER }} \
--service ${{ env.ECS_SERVICE_PRODUCTION }} \
--force-new-deployment
- name: Wait for deployment stability
run: |
aws ecs wait services-stable \
--cluster ${{ env.ECS_CLUSTER }} \
--services ${{ env.ECS_SERVICE_PRODUCTION }}
- name: Verify production health
run: |
for i in {1..10}; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://yourapp.com/health)
if [ "$STATUS" = "200" ]; then
echo "Production health check passed"
exit 0
fi
echo "Attempt $i: got status $STATUS, retrying..."
sleep 15
done
echo "Production health check failed after 10 attempts"
exit 1
notify-failure:
name: Notify on Failure
runs-on: ubuntu-latest
needs: [build, deploy-staging, deploy-production]
if: failure()
steps:
- name: Send Slack notification
uses: slackapi/slack-github-action@v1.26.0
with:
payload: |
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "Deployment Failed",
"emoji": true
}
},
{
"type": "section",
"fields": [
{
"type": "mrkdwn",
"text": "*Repository:*\n${{ github.repository }}"
},
{
"type": "mrkdwn",
"text": "*Branch:*\n${{ github.ref_name }}"
},
{
"type": "mrkdwn",
"text": "*Commit:*\n${{ github.sha }}"
},
{
"type": "mrkdwn",
"text": "*Author:*\n${{ github.actor }}"
}
]
},
{
"type": "actions",
"elements": [
{
"type": "button",
"text": {
"type": "plain_text",
"text": "View Run"
},
"url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
]
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
This pipeline enforces a strict promotion path: code goes to staging first, smoke tests verify it's healthy, and only then can it be promoted to production. The environment: production setting in GitHub lets you require manual approval before the production deploy job runs — configure this in your repository settings under Environments.
The Trivy vulnerability scan catches known CVEs in your Docker image before it ever reaches a running environment. And the failure notification job runs whenever any upstream job fails, so your team knows immediately when a deployment breaks.
Setting up GitHub environments
For the approval gates and environment-specific secrets to work, configure GitHub Environments:
# You can configure environments via the GitHub UI:
# Repository Settings > Environments > New Environment
# Or use the GitHub CLI:
gh api repos/{owner}/{repo}/environments/staging --method PUT
gh api repos/{owner}/{repo}/environments/production --method PUT \
--field 'reviewers=[{"type":"User","id":YOUR_USER_ID}]' \
--field 'deployment_branch_policy={"protected_branches":true,"custom_branch_policies":false}'
Add your secrets to each environment:
- Both environments:
AWS_ROLE_ARN - Repository-level:
SLACK_WEBHOOK_URL,CODECOV_TOKEN
Adding a deployment status badge
Add a deployment status badge to your README so the team always knows the state of main:
# Add this to your README.md:
# 
Optimizing build times
As your project grows, build times creep up. Here are three quick wins:
- Docker layer caching — add
cache-fromandcache-toto the build-push-action to reuse layers between builds - Dependency caching — the
setup-nodeaction'scache: "npm"option avoids re-downloading packages on every run - Parallel jobs — lint, typecheck, and test can run in parallel (we run lint first as a fast-fail, but you can restructure if your tests are fast)
The bottom line
A production-ready CI/CD pipeline isn't something you set up once and forget. But the initial investment — a few hours of configuration — pays off on literally every deployment for the rest of your project's life. Start with the CI workflow on your next pull request, add the deployment pipeline when you're ready, and iterate from there.
The goal isn't a perfect pipeline on day one. The goal is to never manually deploy again.