Zero‑Downtime Deploys with PM2 + Git Hooks — Production‑Ready Playbook Print

  • 0

Zero‑Downtime Node.js Deployments with PM2 & Git Hooks (Cluster Mode, Atomic Releases, Rollbacks)

Learn a battle‑tested, zero‑downtime deployment workflow for Node.js using PM2 (cluster mode) and Git hooks. Includes atomic releases, health checks, rollbacks, log rotation, and CI/CD‑ready patterns.


✍️ Short Summary

Ship updates with no downtime using PM2 + Git hooks. You'll push to a remote bare repo, a post‑receive hook builds a new atomic release, swaps a current → releases/<rev> symlink, and triggers pm2 reload (cluster mode) for truly seamless restarts. Includes rollback, health checks, log rotation, and security hardening.


 

📎 Table of Contents (auto‑generated if long)

  • Why Zero‑Downtime & How This Works

  • Architecture Overview

  • Prerequisites

  • Directory Layout (Atomic Releases)

  • Option A: Git bare repo + post‑receive hook (recommended)

  • PM2 Configuration (ecosystem file)

  • Health Checks & Readiness Gates

  • Log Rotation & Observability

  • Rollback Strategy (Instant)

  • Optional: Using PM2 Deploy & CI/CD

  • NGINX/Caddy Reverse Proxy (snippet)

  • Security Hardening

  • Troubleshooting Checklist

  • ✅ Conclusion / Next Steps

  • 🔗 Related Articles

Note: Replace real domains with example.com. Avoid panel‑specific steps; this guide is platform‑agnostic.


🟢 Start Here -- What is PM2?

PM2 is a production process manager for Node.js. It keeps your app running 24/7, scales it across CPU cores, and performs zero‑downtime reloads so users never see an outage during deploys. PM2 also standardizes logs, environment variables, and startup on reboot--perfect for single‑server or small‑cluster setups.

🎯 What PM2 Does (at a glance)

  • Keep‑alive & auto‑restart: If your app crashes, PM2 restarts it instantly.

  • Scale across cores (cluster mode): Spawn N workers behind an internal load balancer.

  • Zero‑downtime reloads: pm2 reload replaces workers one‑by‑one without dropping connections.

  • Unified logs & rotation: View live logs and add rotation with pm2‑logrotate.

  • Startup on boot: Generate a systemd unit and persist process lists.

  • Config as code: One ecosystem.config.js captures all runtime settings.

🚫 What PM2 Is Not

  • Not a web server/reverse proxy. Use NGINX or Caddy for TLS, HTTP/2, compression, static files, and routing.

  • Not a build or CI tool. Pair with your Git hooks/CI pipeline to build artifacts and trigger reloads.

  • Not an orchestrator. For many hosts/containers, consider Kubernetes or a PaaS.

🤔 When to Use (and When Not To)

Use PM2 when you manage Node.js on your own VM and want a simple, reliable path to scaling + zero‑downtime deploys.
Consider alternatives if you're already on a full container/orchestration stack (Kubernetes), or a PaaS (e.g., Render/Fly/Heroku) that handles process supervision for you.

⚡ 2‑Minute Quick Start

npm i -g pm2
pm2 start server.js --name example-app                 # run once
pm2 start server.js --name example-app -i max          # cluster across all cores
pm2 reload example-app                                 # zero‑downtime reload
pm2 startup && pm2 save                                # start on reboot
pm2 logs example-app                                   # live logs

🧩 Key Terms

  • Fork vs Cluster: Fork = 1 process. Cluster = N workers on the same port with zero‑downtime reloads.

  • Reload vs Restart: Reload is rolling replacement (no downtime, cluster). Restart stops then starts (brief blip).

  • Instances: Number of worker processes (e.g., 2, 4, or max).

  • Ecosystem file: ecosystem.config.js--your app's runtime config (env, logs, scaling, timeouts).

  • Sticky sessions: Needed for WebSockets/Socket.IO so clients stick to one worker.

🧠 Mental Model (Simple Diagram)

Browser → NGINX/Caddy → PM2 (LB) → [Worker 1, Worker 2, ...]
                                 ↳ pm2 reload swaps workers one‑by‑one

❓FAQ

  • Do I still need NGINX/Caddy? Yes--terminate TLS, serve static files, and proxy to the app.

  • Where are logs? pm2 logs (live). Configure rotation via pm2 install pm2-logrotate.

  • TypeScript? Build to JS for prod (tsc). Source maps via --enable-source-maps.

  • WebSockets? Start with --sticky to maintain session affinity.


🚀 Why Zero‑Downtime & How This Works

Goal: Users never feel upgrades. Technique: PM2 runs your app in cluster mode (multiple processes). A push triggers a new release build and then pm2 reload performs a graceful, rolling restart--old workers drain traffic while new ones boot.

Core guarantees:

  • Atomic releases: A new, versioned folder is built and switched in one step.

  • Reproducible builds: Use npm ci and a clean releases/<rev> directory.

  • Safe rollbacks: Repoint current symlink to a previous good release.


🧭 Architecture Overview

Developer → git push → [Server: bare repo]
                         └─ post‑receive hook → create releases/<rev>
                                               → npm ci && build
                                               → update symlink current → releases/<rev>
                                               → pm2 startOrReload ecosystem.config.js --env production
                                               → health check & verify

Screenshots (placeholders):

  • [Screenshot] PM2 dashboard after reload (pm2 status).

  • [Screenshot] Release directories and current symlink.

  • [Screenshot] /healthz endpoint returning 200 OK.


✅ Prerequisites

  • Linux server with Node.js 18+ and Git 2.4+.

  • PM2 globally installed: npm i -g pm2.

  • A dedicated non‑root user (e.g., deploy).

  • Reverse proxy (NGINX/Caddy) pointing to your app's port (e.g., 3000).

  • Environment variables available on the server (via .env or PM2 envs). Do not commit secrets.

Tip: Ensure the user running Git hooks has node and pm2 in PATH. If you use a version manager (e.g., nvm), export PATH inside the hook.


🗂️ Directory Layout (Atomic Releases)

/var/www/example-app
├─ repo.git/            # bare repo (server remote)
├─ releases/            # timestamped or <rev> folders
├─ shared/              # .env, uploads/, tmp/, etc.
├─ current → releases/<rev>   # symlink switched atomically
└─ ecosystem.config.js

🧩 Option A: Git Bare Repo + post‑receive Hook (Recommended)

This option is simple, fast, and CI/CD‑ready without extra services.

1) Create folders & permissions

sudo mkdir -p /var/www/example-app/{releases,shared}
sudo mkdir -p /var/www/example-app/repo.git
sudo chown -R deploy:deploy /var/www/example-app

2) Initialize bare repo (server)

cd /var/www/example-app/repo.git
git init --bare

Add this server as a remote in your local repo, e.g.:

# on your laptop/workstation
git remote add production deploy@server:/var/www/example-app/repo.git

3) post‑receive hook (server)

Create hooks/post-receive in the bare repo and make it executable.

cat > /var/www/example-app/repo.git/hooks/post-receive <<'HOOK'
#!/usr/bin/env bash
set -euo pipefail

APP_DIR=/var/www/example-app
RELEASES_DIR="$APP_DIR/releases"
CURRENT_LINK="$APP_DIR/current"
ECOSYSTEM="$APP_DIR/ecosystem.config.js"
SHARED_DIR="$APP_DIR/shared"

# derive revision id and a release folder
read oldrev newrev ref
REV=$(echo "$newrev" | cut -c1-7)
RELEASE="$RELEASES_DIR/$REV"

# ensure PATH for node/pm2 (adjust if using nvm/asdf)
export PATH=/usr/local/bin:/usr/bin:$PATH

mkdir -p "$RELEASE"
GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev"

cd "$RELEASE"
# install deps (production only); switch to `pnpm i --frozen-lockfile --prod` if using pnpm
npm ci --omit=dev

# build step if required (React/Next/Nuxt/etc.)
if [ -f package.json ] && jq -er '.scripts.build' package.json >/dev/null 2>&1; then
  npm run build
fi

# link shared assets (env, uploads, etc.)
if [ -f "$SHARED_DIR/.env" ]; then
  ln -sfn "$SHARED_DIR/.env" "$RELEASE/.env"
fi

# atomic switch
ln -sfn "$RELEASE" "$CURRENT_LINK"

# start or reload via PM2 (zero‑downtime in cluster mode)
if pm2 list | grep -q "example-app"; then
  pm2 startOrReload "$ECOSYSTEM" --env production
else
  pm2 start "$ECOSYSTEM" --env production
fi

# optional: health check
curl -fsS http://127.0.0.1:3000/healthz >/dev/null && echo "Health OK" || { echo "Health FAILED"; exit 1; }

# persist pm2 process list on reboot
pm2 save
HOOK
chmod +x /var/www/example-app/repo.git/hooks/post-receive

Why atomic? If build fails, the current symlink is untouched, so the live app stays healthy.

4) First push

git push production main

The hook builds a release, switches current, and performs a graceful pm2 reload.


⚙️ PM2 Configuration (ecosystem.config.js)

Create /var/www/example-app/ecosystem.config.js:

module.exports = {
  apps: [
    {
      name: "example-app",
      script: "./server.js",          // your entry file
      exec_mode: "cluster",           // enables zero‑downtime reloads
      instances: "max",               // or a fixed number like 2 or 4
      watch: false,
      max_memory_restart: "512M",
      env: {
        NODE_ENV: "production",
        PORT: 3000
      },
      env_production: {
        NODE_ENV: "production"
      },
      kill_timeout: 5000,              // allow workers to drain
      listen_timeout: 8000,            // readiness window
      out_file: "/var/log/pm2/example-app.out.log",
      error_file: "/var/log/pm2/example-app.err.log",
      merge_logs: true
    }
  ]
};

Important: Zero‑downtime requires cluster mode (multiple instances) or a blue‑green strategy. In fork mode (single process), reload behaves like a restart (brief interruption).

Persist PM2 on reboot:

pm2 startup
pm2 save

🩺 Health Checks & Readiness Gates

Expose a simple health endpoint (e.g., Express):

app.get("/healthz", (req, res) => {
  res.status(200).json({ ok: true, uptime: process.uptime() });
});

Readiness tips:

  • Defer accepting traffic until DB connections are ready.

  • Use listen_timeout + kill_timeout in PM2 to gracefully replace workers.

  • With HTTP proxies, configure active health checks to the current target.


📊 Log Rotation & Observability

Install PM2's logrotate module:

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 7
pm2 set pm2-logrotate:compress true
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss

Live insights:

pm2 status
pm2 logs example-app
pm2 monit

⏪ Rollback Strategy (Instant)

List available releases:

ls -1 /var/www/example-app/releases

Switch back atomically and reload:

cd /var/www/example-app
ln -sfn releases/<good_rev> current
pm2 reload ecosystem.config.js --env production

Keep the last N releases (e.g., 5) and prune older ones via a nightly cron.


🔄 Optional: Using PM2 Deploy & CI/CD

PM2 has a built‑in deploy system that can pull from Git and run post-deploy commands. It's CI/CD‑friendly and uses your ecosystem file. Example snippet:

module.exports = {
  apps: [ /* ...same as above... */ ],
  deploy: {
    production: {
      user: "deploy",
      host: ["server"],
      ref: "origin/main",
      repo: "git@your-vcs:org/example-app.git",
      path: "/var/www/example-app",
      "post-deploy": "npm ci --omit=dev && npm run build && pm2 startOrReload ecosystem.config.js --env production"
    }
  }
}

Works great with CI. Use environment‑specific secrets from your runner's vault; avoid committing secrets.


🌐 Reverse Proxy (NGINX / Caddy)

NGINX example:

upstream example_app {
  server 127.0.0.1:3000;
}
server {
  server_name example.com;
  location / {
    proxy_pass http://example_app;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }
}

Caddy example:

example.com {
  encode gzip
  reverse_proxy 127.0.0.1:3000
}

Ensure your proxy health checks point at /healthz.


🔐 Security Hardening

  • Create a locked‑down deploy user; limit sudo.

  • Store .env in shared/ with strict permissions.

  • Only allow fast‑forward pushes; protect main branch in your VCS.

  • Ensure hooks are executable and owned by the deploy user.

  • Keep Node/PM2 updated; enable OS auto‑security updates.


🧰 Troubleshooting Checklist

  • Hook didn't fire? File not executable (chmod +x hooks/post-receive).

  • node: command not found in hook? Export PATH to Node/PM2; avoid relying on an interactive shell.

  • Operation must be run in a work tree? Use GIT_WORK_TREE="$RELEASE" git checkout -f "$newrev".

  • Build succeeds but app fails? Check pm2 logs and ensure env vars are linked from shared/.env.

  • No zero‑downtime? Confirm exec_mode: "cluster" and instances >= 2.

  • Memory leaks / crashes? Set max_memory_restart, add monitoring, profile hotspots.


🧠 PM2 Deep Dive -- Concepts, Options & Best Practices

Drop this section into your existing article to give readers everything they need to operate PM2 confidently in production.

🔍 What PM2 Is (and Isn't)

  • Process manager for Node.js: starts, keeps alive, restarts on failure, and reloads with zero downtime in cluster mode.

  • Not a build tool or CI server--pair it with your Git hooks/CI to ship artifacts.

🧪 Install, Update, and Verify

npm i -g pm2@latest
pm2 -v
pm2 update             # refresh PM2 runtime + agent without losing processes

🧬 Process Models

  • Fork mode: 1 process. Simple, but reload behaves like a restart (tiny blip). Suitable for workers/queues.

  • Cluster mode: N processes share the same port. pm2 reload replaces workers one by onezero‑downtime web apps.

    • Choose instances: "max" for all CPU cores or a fixed number (e.g., 2 or 4).

    • WebSockets/Socket.IO? Use sticky sessions to keep a client on the same worker:

      pm2 start ecosystem.config.js --sticky
      

🔄 Reload Semantics & Graceful Lifecycle

  • pm2 reload <name>: rolling replacement (cluster mode). Old worker drains, new worker boots.

  • pm2 restart <name>: stop then start (brief interruption).

  • pm2 stop <name>: take the app offline.

Graceful readiness (recommended):

  1. App signals it's ready using PM2's wait_ready mechanism.

  2. PM2 waits up to listen_timeout for the ready signal before routing traffic.

Ecosystem snippet:

{
  name: "example-app",
  script: "server.js",
  exec_mode: "cluster",
  instances: 2,
  wait_ready: true,       // app will call process.send('ready')
  listen_timeout: 8000,   // how long PM2 waits for 'ready'
  kill_timeout: 5000      // how long to let old worker drain
}

App code:

const http = require('http');
const server = http.createServer(handler);

server.listen(3000, () => {
  if (process.send) process.send('ready');
});

process.on('SIGINT', gracefulExit);
process.on('SIGTERM', gracefulExit);

function gracefulExit(){
  server.close(() => process.exit(0)); // finish inflight reqs
  setTimeout(() => process.exit(1), 8000); // hard timeout
}

🧾 Ecosystem File -- Common Options (Cheat Sheet)

module.exports = {
  apps: [{
    name: "example-app",
    script: "./server.js",
    args: "",                 // extra CLI args to your script
    exec_mode: "cluster",     // or "fork"
    instances: "max",         // or a number
    cwd: "/var/www/example-app/current", // working dir
    watch: false,              // change to true ONLY for dev
    ignore_watch: ["node_modules", "logs", "tmp"],
    max_memory_restart: "512M",
    min_uptime: "10s",        // consider app unstable before this
    max_restarts: 10,          // cap restarts for flapping apps
    exp_backoff_restart_delay: 200, // ms; grows exponentially
    env: { NODE_ENV: "production", PORT: 3000 },
    env_production: { NODE_ENV: "production" },
    out_file: "/var/log/pm2/example-app.out.log",
    error_file: "/var/log/pm2/example-app.err.log",
    merge_logs: true,
    log_date_format: "YYYY-MM-DD HH:mm:ss Z",
    node_args: "--enable-source-maps --max-old-space-size=512",
  }]
}

🔑 Environment & Secrets

  • Prefer PM2 env vars (env, env_production) for non‑secret config.

  • For secrets, keep a .env in shared/ and symlink it into each release (as shown in your Git hook).

  • If you use dotenv, load it at the top of your entry file.

💾 Startup on Reboot & State Persistence

pm2 startup systemd -u deploy --hp /home/deploy   # generate unit
pm2 save                                         # persist process list
pm2 resurrect                                    # restore from dump
pm2 unstartup systemd                            # remove integration

🪵 Logs & Rotation

  • Live tail: pm2 logs <name> or all apps: pm2 logs.

  • Install rotation once:

    pm2 install pm2-logrotate
    pm2 set pm2-logrotate:max_size 10M
    pm2 set pm2-logrotate:retain 7
    pm2 set pm2-logrotate:compress true
    pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
    

🧰 Everyday Commands

pm2 list                        # overview
pm2 status                      # same with more details
pm2 describe example-app        # full metadata
pm2 reload example-app          # zero‑downtime reload
pm2 restart example-app         # hard restart
pm2 stop example-app            # stop
pm2 delete example-app          # remove from PM2
pm2 env 0                       # show env of app with id 0
pm2 monit                       # ncurses dashboard

📈 Observability & Metrics

  • Add /healthz and /readyz endpoints; wire your reverse proxy health checks.

  • Use pm2 monit for CPU/mem; export app metrics to Prometheus (e.g., prom-client) and include GC stats.

🧱 Blue‑Green with PM2 (Alternative Pattern)

  • Run two app names (e.g., app-blue, app-green) on distinct ports.

  • Deploy to the idle color, smoke‑test /healthz, then flip the reverse proxy upstream.

  • Keep the previous color for instant rollback.

🧵 TypeScript & Source Maps

  • Prefer building to JS during deploy (tsc -p .) and run built files.

  • If running TS directly, use ts-node in dev only; in prod, JS is safer.

  • Enable stack traces with original lines:

    node_args: "--enable-source-maps"
    

🌐 Sticky Sessions for Realtime Apps

  • For Socket.IO/WebSockets in cluster mode, start with sticky routing:

    pm2 start ecosystem.config.js --sticky
    

    Ensure your reverse proxy is a TCP pass‑through to PM2's LB (or terminate TLS then proxy to the app port).

🧯 Advanced Troubleshooting

  • node: command not found in hooks: export PATH in the hook; avoid relying on interactive shells or nvm being sourced.

  • Reload isn't zero‑downtime: verify exec_mode: "cluster" and instances >= 2; check wait_ready/listen_timeout logic.

  • Flapping restarts: tune min_uptime, max_restarts, exp_backoff_restart_delay; inspect pm2 logs.

  • Memory pressure: increase --max-old-space-size, fix leaks, or scale instances.

  • File watchers in prod: keep watch: false to avoid CPU spikes.

✅ Quick PM2 Checklist

  • Cluster mode with ≥2 instances

  • wait_ready, listen_timeout, kill_timeout set

  • Log rotation configured

  • pm2 startup + pm2 save in place

  • Health checks wired in proxy & CI smoke tests

  • Rollback plan tested (previous release kept)


✅ Conclusion / Next Steps

With PM2 cluster mode + atomic releases via Git hooks, you get guaranteed, zero‑downtime deploys, clean rollbacks, and predictable builds. Next, wire this into your CI pipeline, add synthetic monitoring for /healthz, and automate release pruning.

Quick next steps:

  1. Create the bare repo and post‑receive hook.

  2. Add the PM2 ecosystem file with cluster mode.

  3. Push to production and verify pm2 status + /healthz.

  4. Schedule log rotation and release cleanup.


🔗 Related Articles (suggested)

  • PM2 Deep Dive: Cluster Mode, Reloads, and Graceful Shutdowns

  • Git Hooks 101: pre‑receive, post‑receive, and Secure Deployment Patterns

  • Managing Secrets: .env vs environment vaults in CI/CD

  • Observability Stack for Node.js (logs, metrics, traces)

  • Blue‑Green vs Rolling Deploys: When to Choose What


🧩 Appendix - Full Example Files

A) ecosystem.config.js

module.exports = {
  apps: [
    {
      name: "example-app",
      script: "./server.js",
      exec_mode: "cluster",
      instances: 2,
      watch: false,
      max_memory_restart: "512M",
      env: { NODE_ENV: "production", PORT: 3000 },
      env_production: { NODE_ENV: "production" },
      kill_timeout: 5000,
      listen_timeout: 8000,
      out_file: "/var/log/pm2/example-app.out.log",
      error_file: "/var/log/pm2/example-app.err.log",
      merge_logs: true
    }
  ]
};

B) hooks/post-receive

(From the section above; ensure chmod +x.)

C) Example Express server with health endpoint

const express = require("express");
const app = express();

app.get("/healthz", (req, res) => res.json({ ok: true }));

app.get("/", (req, res) => res.send("Hello from zero‑downtime deploy!"));

app.listen(process.env.PORT || 3000, () => {
  console.log("Server started");
});

Last updated: {{today}}


Was this answer helpful?

« Back